View Full Version : Indexing halts on CRC match
burnsl
04-19-2005, 07:15 PM
I have noticed that my indexer stops indexing when a CRC match of a second document is found.
Anyone seen this?
Can you describe how it "stops indexing"? eg. does it finish and create the index files, or does it just sits there like its still looking for files (and does so for a very long time)?
Can you also check that you are using the latest build of Zoom available (Version 4.0 Build 1016):
http://www.wrensoft.com/zoom/whatsnew.html
If this continues to happen, save the index log after indexing ("File"->"Save index log to file"), make sure you have Verbose mode on, and e-mail the log as well as your ZCFG file to us via zoom [at] wrensoft [dot] com.
burnsl
04-20-2005, 05:29 AM
It just halts.
no saving of files.
just sits there.
I'll test it with 1016.
burnsl
04-21-2005, 09:08 PM
testing with the latest build results in the same problem.
it halts (hangs).
If i click stop, it completes the indexing and uploads the new indexs.
see the line near the end:
"15:09:11 - [SKIPPED] Skipping http://www.wdf.org/ (Identical page found: CRC-32 signature matched)"
please see the log:
http://www.conway.com/search/cdi-indexes-log.txt
Think we've found the problem. It's actually when you have a start point which fails a CRC-32 check. Zoom then doesn't realize it should move on to the next start point (or stop indexing).
We'll fix this bug in Version 4.1. In the meantime, you can remove this start point (http://www.wdf.org/) or disable CRC-32.
The reason it fails, by the way, is because of http://www.conway.com/wdf/ which was indexed earlier and is the same page.
burnsl
04-22-2005, 02:16 AM
Would it work out better to tell it to ignote the directory http://www.conway.com/wdf/
Then when it hit the wdf domain, it would work?
Ill ask before looking, but how do i tell it to ignore the http://www.conway.com/wdf/ path?
burnsl
04-22-2005, 02:36 AM
i added
http://www.conway.com/wdf/
To the skip list.
It still hung at the same place.
I also looked in the logs, it shows that it found and indexed the http://www.conway.com/wdf/ path earlier, and the new log with the skip instruction shows no evidence that the sring "http://www.conway.com/wdf/" was read.
So it skipped it and still hung.
Perhaps another URL somewhere else on your site (or one of the ones you're indexing) has another copy of the same page?
What you should try to do is, re-arrange your start points so that "http://www.wdf.org/" is first. Because the order of the start points change the order in which pages are indexed, it would allow this site to index first, and then the other "copies" of this page will be skipped accordingly.
burnsl
04-22-2005, 02:16 PM
this is strange,
becuase i re-ran the index after removing the
"*" from the end of the path in question. (I thought it was required) and now it flies by the start point with CRC32 enabled.
Good call!
Glad to hear that you've got it working.
Just to clarify - the skip pages list does not support wildcards (eg. "*"), it serves more like a list of keywords which is matched against the URL. There are some examples in the Help file if you need more info. Hope that helps.
Powered by vBulletin® Version 4.1.12 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.