PDA

View Full Version : Handling download errors when spidering


Hex Angel
11-22-2008, 06:51 PM
Maybe you've got plans for this in Ver. 6...

I ran a spider over night -- actually, it takes about 36 hours to complete. Not sure what happened but either my internet went down, or my site went down or... I'm not sure what. Anyway, Zoom uploaded the data files of a partial index, which kinda sucks. It's going to take at least 36 hours to create another.

This just happens from time-to-time during an index, and it'd be great if Zoom could be configured to handle the conditions more gracefully. EG:

1> Set an error threshold in the configuration -- when that number of errors occurs, pause or stop the index.

2> Provide multiple retries to retrieve a file that fails on the first attempt.

3> Keep a list of failed files that can be used to run an incremental backup later.

4> Provide the option to BACKUP all existing index files before uploading the
the new ones.

Thanks for your consideration.

Cheers,

Patrick.

wrensoft
11-22-2008, 11:53 PM
In V6 it will be easier to get a list of errors that occurred.

But other no changes have been implemented in this area. Multiple re-tries only occasionally make sense. In most case a broken link will stay broken for the duration of the indexing process (on most sites that have bad pages, the pages have be bad for months or years). And it is hard to always be sure if a download failure is due to a local ISP error, or a remote server error.

Stopping after X errors might make more sense.

If it takes you 36 hours to rebuild the index, then what I would suggest however is that you backup your current good set of index files from time to time. You should do this in any case in case of hard disk failure or the like.