View Full Version : Feature request: lost network connection retry serveral times after certain seconds
11-30-2008, 10:16 AM
My indexer completed indexing before scanning all urls. The last log message is: Could not download file: No active connection to the Internet found. Check your connection and network settings.
My network connection seems to be unstable when downloading large amount of data. But the lost only last for one or two seconds. Is it possible to add an option to retry a number of times after certain interval and do not stop the indexing?
11-30-2008, 10:22 AM
When this happens, can I safely delete the first 15,000 urls from the scan list and start the incremental indexing, if the log said 15,000 urls has been scanned?
11-30-2008, 11:45 AM
I could not find a suitable incremental indexing option to index the unindexed start points that still remained in the start points list.
Generally, it would make more sense to address the issue as to why your network connection is unstable. Are you on a wireless network?
Incremental indexing is not designed to address this problem. Incremental update will not re-index the URLs that have been deleted.
Given the number of broken links that can be found on an average website, it can be very inefficient to implement something which re-attempts a failed download. Imagine finding 20 broken links on a site due to a mistake in design (e.g. relative links are wrong after moving the files to a different folder), and for each of these broken links, we now have to re-try them x times before we give up on it, it would slow down things significantly for everyone else.
Also, since we don't know when your network connection would be back up, the duration to wait would vary so significantly that it would make it impossible to guess reasonably.
Having said that, we'll keep this scenario in mind and see if we can come up with anything that may help in the future.
11-30-2008, 11:36 PM
I am not on a wireless network, and I will have my network connection checked today. But just in case it happens again in the future for some unpredictable reason. Can you tell me what triggers the indexing to complete? A certain amount of broken links or the detection of lost network connection? If latter, can you add a re-try option in the configuration file so users can make their choices depending on their situation? And if the network still cannot get back at the end of the re-try, can you make it possible to incrementally index the remaining unindexed start points without a full reindex? For now, I found there is no way for me to index the remaining start points without a full reindex. Since I cannot make any changes to the start points list, the unindexed ones are still in there. So when I tried the "add start points to existing index" option, it said the urls are already in the list and wouldn't start indexing. That means whenever my network fails I will have to reindex everything.
Indexing completes when either:
There are no more links to follow (in spider mode), or there are no more files to scan in the given folder(s) (offline mode).
The user has pressed "Stop Indexing"
The internet/netework connection is lost during indexing (spider mode only).
We'll look into what might best help people with unreliable network connections for a future version, but it is not high in our priority at the moment due to low user demand.
12-01-2008, 12:28 AM
So what should I do when this happens while waiting for the new version, a full reindex?
12-01-2008, 12:33 AM
Is it easy to make the indexer pause the indexing instead of finishing when the network fails?
12-01-2008, 02:29 AM
My network connection has been checked and nothing is wrong. In case this happens again, which will be a big waste of time and work, so I want to request a customization of the v6 which change the stop indexing to pause indexing when the network connection fails.
12-01-2008, 02:38 AM
I cannot start indexing again until this has been done, because I do not want to lose all the work that has been done and restart all over again. Would you please tell me how long it takes for this customization to be done and how much it costs? Thanks.
12-01-2008, 04:24 AM
We are busy for the next week or two getting V6 out the door. We really need to get this done before Christmas break and this is our priority at the moment (and to be honest the 12 posts you have made in the last 48 hours isn't helping us meet that deadline)
When Ray referred to a future version he was referring to V6.1 or V7. Both of which are many month (or years) away.
We would be happy to look at doing this as custom development, but we will have no available resources to start this work for a few weeks. So you might want to get back to us either just before or after Xmas.
If you network connection keeps dropping then by definition you have (or had) a problem with your network connection. There is (or was) a problem, but being able to identify what it is might be a different story.
Other options you might want to investigate in the meantime include,
1) Splitting your index into many smaller sets of index files and using MasterNode (http://www.wrensoft.com/masternode/index.html) to combine the search results. Then any index failure will result in a much small amount of missing data.
2) Break up your index into smaller chunks. Then index each chunk incrementally. Saving the set of index files before each step. So in the event of a network failure, you fall back to the last backup point. You could probalby automate this with a batch script if required.
Powered by vBulletin® Version 4.1.12 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.