auxiliaryField with multiple regexes

Suggestions, questions oder problems with regain

Moderator: thtesche

auxiliaryField with multiple regexes

Postby kfiles » Sun Jun 09, 2013 4:30 am

I saw that the XML schema for auxiliaryField had changed between versions of regain (the online help shows the regex as a child text element of auxiliaryField; the sample CrawlerConfiguration.xml shows a separate <regex> child element). I took this to suggest that you can now how multiple regexes for a field. However, this did not work as I expected.

I used the following definition:

Code: Select all
    <auxiliaryField name="router" regexGroup="1" toLowercase="true">
      <regex>mert.cvs/reports/.*ByRouter/([A-z0-9.]+)</regex>
      <regex>mert.cvs/configs/([A-z0-9.]+)\.cfg</regex>
      <regex>mert.cvs/configs-[a-z]+/([A-z0-9.]+)\.cfg</regex>
    </auxiliaryField>


I expected URLs matching any of the 3 regexes would have the "router" field added. However, what actually happened is that only URLs matching the first regex worked. Others silently failed. Replacing the above definition with 3 separate auxiliaryField elements, each with a single regex, worked OK.

Is this expected? If only a single child is allowed, why doesn't the above configuration produce an error?

I also would like to know what the "store" and "tokenize" attributes to auxiliaryField do? They are used in the sample CrawlerConfiguration.xml, but not documented in the online help.
kfiles
 
Posts: 3
Joined: Fri Jun 07, 2013 4:48 pm

Return to regain

Who is online

Users browsing this forum: No registered users and 1 guest

cron