V5 development progress - Search depth control
This allows you to configure the behaviour of the search engine when searching very large sites. This affects how far your searches will be able to go when a large number of results is returned (usually when someone performs a vague search on a large site with tens of thousands of pages).
This feature can be found under the "Limits" tab of the configuration window. There is a slider bar labeled, "Optimize search behaviour for large sites".
You can select between having a slower search that returns more ranked results vs a fast search that returns fewer ranked results. This is a useful feature when you have 100,000's of pages in your index. But has little to no effect on small sites.
It has the biggest impact is when people use exact phrase searches and the phrase is made up of common words, and the search is across a large number of large documents. This type of search would normally be fairly disk and CPU intensive. Selecting a fast search would result in the search set being trimmed down to the most promising documents. Less documents will be be examined for the matching exact phrase, which will be faster, but runs the risk that some hits are missed.
Selecting a 'slow search' results in a near exhaustive search of the entire document set being done. Which can be resource intensive under the conditions noted above. But it is unlikely that any result will be missed.
When this feature activates, the user is warned with the following message, "Your search query contained too many common words to return the entire set of results available. Please try again with a more specific query for better results."
The V5 feature replaces the misunderstood "Max context seeks" feature that was in V4 of Zoom.
Because of the above changes, some users with large sites will find that it is no longer possible to do a wild card search for "**" or "*e*" to get all the pages indexed in V5. The search set will be trimed down to return results quicker. Note that these searches were only typically used for maintenance purposes by the website owner, and it is unlikely that an end user will perform such searches on your website (or expect useful results from doing so). For those not familar with wildcard searches, *e* means find every page that contains at least one word that contains the letter 'e'.
If you still wish to browse an entire list of the files indexed, you could either:
a) Browse the zoom_pagedata.zdat file (making sure NOT to edit the file)
b) Use the new "View or delete pages from existing index" feature in the Zoom Indexer application (find it under the menus: "Index"->"Manage existing index"). This will allow you to browse or look through the list of pages that have been indexed. Remember to load the ZCFG file with the settings for the corresponding index session first.
c) Enable logging to file when indexing, so that a list of all indexed files are saved to disk during indexing. See the "Index Log" tab of the Configuration window. You can turn off other messages so that only "Indexing" messages are stored.