Hello,
Thank you in advance for any insight or assistance. I am new to Regain. I recently installed Regain and indexed a local file directory without any issues. Things worked great. This morning I pointed the crawler to another local directory with significantly more directories and files (6000+ directories and 35000+ files) composed of images, pdf, Microsoft documents, etc.
The crawler was working fine for the first hour+ indexing content, but all the crawler displays now is the following repeating text in its standard out. It seems to be stuck in an infinite loop that it is unable to get out of.
11:08:38: Invalid dictionary, found:? but expected:''
The process has created an index directory and respective index files. I also looked at the crawler log and I don't see anything out of the ordinary in the logs. Not knowing if this is normal, I killed the crawler process 2 hours into the indexing. The last log entry is as follows:
2010-01-27 10:12:05 [main] ERROR: Preparing file://F%3A/A-PROPOSALS/MASTER+100%25Rev7+CD.xls with preparator net.sf.regain.crawler.preparator.PoiMsOfficePreparator failed
net.sf.regain.RegainException: Preparing file://F%3A/A-PROPOSALS/MASTER+100%25Rev7+CD.xls with preparator net.sf.regain.crawler.preparator.PoiMsOfficePreparator failed
at net.sf.regain.crawler.document.DocumentFactory.createDocument(DocumentFactory.java:350)
at net.sf.regain.crawler.document.DocumentFactory.createDocument(DocumentFactory.java:273)
at net.sf.regain.crawler.IndexWriterManager.createNewIndexEntry(IndexWriterManager.java:737)
at net.sf.regain.crawler.IndexWriterManager.addToIndex(IndexWriterManager.java:720)
at net.sf.regain.crawler.Crawler.run(Crawler.java:559)
at net.sf.regain.crawler.Main.main(Main.java:137)
Caused by: net.sf.regain.RegainException: Reading MS* (OpenXML) document failed : file://F%3A/A-PROPOSALS/MASTER+100%25Rev7+CD.xls
at net.sf.regain.crawler.preparator.PoiMsOfficePreparator.prepare(PoiMsOfficePreparator.java:91)
at net.sf.regain.crawler.document.DocumentFactory.createDocument(DocumentFactory.java:335)
... 5 more
Caused by: java.lang.IllegalStateException: bad text '&A'.
at org.apache.poi.hssf.usermodel.HeaderFooter.splitParts(HeaderFooter.java:77)
at org.apache.poi.hssf.usermodel.HeaderFooter.getLeft(HeaderFooter.java:87)
at org.apache.poi.hssf.extractor.ExcelExtractor._extractHeaderFooter(ExcelExtractor.java:395)
at org.apache.poi.hssf.extractor.ExcelExtractor.getText(ExcelExtractor.java:385)
at net.sf.regain.crawler.preparator.PoiMsOfficePreparator.prepare(PoiMsOfficePreparator.java:84)
... 6 more
