View Full Version : Auto pause upon error
Anonymous
06-13-2005, 12:49 PM
A really useful feature to prevent time outs should be that the indexer pauses itself whenever it encounters an error and possibly bleep, is this possible
thanks
I'm not sure why this would prevent timeouts? Perhaps you have a specific situation in mind that you can elaborate. Are you talking about timeouts during spider mode indexing?
A timeout may occur when the Indexer (in spider mode) is expecting a response from the web server and it fails to receive this after about a minute of waiting. This is often due to a failure to connect to the server, or the server may be overloaded and can not handle the number of requests it is getting (so some gets rejected).
In these cases, pausing won't really do much. Unless you mean that it should pause and re-send the request after a period of time. Otherwise pausing and beeping would just halt indexing whenever a page times out. This would also be impossible for scheduled indexings which may be un-attended, as the indexer would just halt and freeze.
Anonymous
06-14-2005, 09:56 AM
Hi there
Yes, it is in spider mode where I have the problem. It seems as if once the server starts to reject a request, and once the pages to scan run out (while their are still pages to download on the website), it moves onto the next url because it thinks their are no more pages in that website.
Your point was good about resuming after a certain amount of time. That would be great. I've found that the indexing has to be resumed after about 10 minutes or so for everything to be fine when I resume.
May be you could have a time limit or and no time limit, ie, manual restart.
Thank you
wrensoft
06-14-2005, 10:24 PM
If the web server needs 10 minutes of rest before you can view a page, then there is something wrong with the web server.
If Zoom can't download a page, then neither can a normal user view a page on the web site.
So it would seem more logical to fix the problem on the server rather than trying to make Zoom work around the web server problem.
--------
David
jough
08-01-2005, 03:49 AM
I have the same problem but with connectivity on my ISP's side, not my server's side.
Comcast goes down more often than an heiress in a homemade movie, and when the Indexer can't spider the pages it just skips them.
It would be nice if it could re-try skipped pages after it was done.
We don't think it would be practical to retry every possible error that can happen when retrieving a page (because most of the time, they really are errors and may be problems with configuration etc. which would not fix itself just by waiting a period of time). However, if there is a more identifiable error for your problem, then we can consider if something reasonable can be done.
Perhaps you can send us your index log ("File"->"Save index log to file") and show us what happens specifically when you are having connection issues which impairs your indexing.
The other alternative is, of course, to use a computer with a better Internet connection to do your spider mode indexing. Or consider the possiblity of indexing offline or on a local server.
jough
08-01-2005, 06:35 AM
With our previous search solution we were indexing via an offline server, but as we're looking to apply Zoom to a dynamic and oft-refreshed site that is built from contributions from hundreds of sources per hour, that's not really practical.
I understand that some errors are simply errors, but I'm seeing timeouts on numerous pages (even when I have a strong internet connection) that could easily be fixed with better log maintenance. The log already keeps track of the kinds of errors reported (to a reasonably fine degree) so it's a matter of parsing the log and making a list of pages to try again later.
I wrote a script that is feeding those pages back to the indexer one by one as starting points, but it would be easier if the indexer had this function natively.
Consider this a feature request, I guess.
wrensoft
08-01-2005, 10:41 PM
I am not sure how you can have a "strong" internet connection if you have timeouts all the time. That would seem to be a contradiction. I would be complaining to our ISP if we had this problem.
I think Ray wanted to see part of your log file. To identify the exact error you are getting. There are several different timeout situations and from your description we can't tell which particular one you are getting.
--
David
jough
08-02-2005, 05:43 AM
I am not sure how you can have a "strong" internet connection if you have timeouts all the time. That would seem to be a contradiction. I would be complaining to our ISP if we had this problem.
I use Comcast at home (horrible connectivity, but it's usually free because of all of the downtime) and Verizon DSL at the office (2.4mbps, always up, fast both uploading and downloading - in other words, "strong"). Sometimes I use a wireless connection too, which is always patchy.
In any case, since I've broken up my site into smaller chunks for spidering I haven't seen the problem. It seems to occur consistently (regardless of whether I'm on the DSL line, a client's T1, wireless, etc.) when I just start the spider off at the root of the site and let it wander.
The next time I see the problem I'll send my log file.
Or heck, you can try it: http://poetryx.com (making sure that the base url includes the subdomains poetry.poetryx.com and articles.poetryx.com).
Powered by vBulletin® Version 4.1.12 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.