PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Handling download errors when spidering

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling download errors when spidering

    Maybe you've got plans for this in Ver. 6...

    I ran a spider over night -- actually, it takes about 36 hours to complete. Not sure what happened but either my internet went down, or my site went down or... I'm not sure what. Anyway, Zoom uploaded the data files of a partial index, which kinda sucks. It's going to take at least 36 hours to create another.

    This just happens from time-to-time during an index, and it'd be great if Zoom could be configured to handle the conditions more gracefully. EG:

    1> Set an error threshold in the configuration -- when that number of errors occurs, pause or stop the index.

    2> Provide multiple retries to retrieve a file that fails on the first attempt.

    3> Keep a list of failed files that can be used to run an incremental backup later.

    4> Provide the option to BACKUP all existing index files before uploading the
    the new ones.

    Thanks for your consideration.

    Cheers,

    Patrick.
    :: ::

  • #2
    In V6 it will be easier to get a list of errors that occurred.

    But other no changes have been implemented in this area. Multiple re-tries only occasionally make sense. In most case a broken link will stay broken for the duration of the indexing process (on most sites that have bad pages, the pages have be bad for months or years). And it is hard to always be sure if a download failure is due to a local ISP error, or a remote server error.

    Stopping after X errors might make more sense.

    If it takes you 36 hours to rebuild the index, then what I would suggest however is that you backup your current good set of index files from time to time. You should do this in any case in case of hard disk failure or the like.

    Comment

    Working...
    X