View Full Version : Indexing stops at same point
johnsatt
04-20-2005, 01:25 AM
I'm using Zoom Search Engine Professional 4.0.1061 and I'm trying to index a users folder full of pdf, tif, doc, xls, ppt, etc. The user has 11,120 files in 704 folders. When I start to index the files, everything works fine until it stops indexing at the same point everytime (which incidently is about 3-5 minutes into the indexing).
I'm running zoom search engine on a P4 3.0 ghz windows 2000 server, 1GB of ram, and approx 500 + GB of storage. The settings I'm using in zoom are as follows:
Max. pages to scan 20000
Max. unique words 50000
Max. file size scanned 6024
Max. description length 150
Multiple threads 4
If I index an individual folder containing about 400 pdf's it works fine. Any input into why this is happening would be much appreciated.
johnsatt
04-20-2005, 01:27 AM
Also FYI: xlhtml.exe hangs in memory when I try to exit (after the indexing stops and I exit the program)
sizbut
04-20-2005, 11:33 AM
That may be your clue. In fact WrenSoft have just posted a new version of the Excel plugin on their website that includes fixes for issues that seem very similar to yours.
johnsatt
04-20-2005, 07:40 PM
I have updated the Excel plugin and it works. Thank you. Now I need to figure out how it can index all the files and subfolders. It seems to only index around 1,700 files when infact there are 11,120 files in 704 folders. Anyone have any suggestions? Should I index just one huge folder or should I index individual folders? Can one index do the subfolders as well? How should I approach this?
Are you using spider mode? If so, take a look at:
http://www.wrensoft.com/zoom/support/faq_problems.html#spider_finding
Offline mode indexes everything within the start directory, but it requires files to be on a local drive rather than on a web server. Offline mode also can not index dynamically generated pages (eg. .php, .asp, etc).
Generally, you should turn on Verbose mode if you want to find out why certain files are being skipped. It may simply be due to your configuration such as max file size limit, etc.
You should also note that "TIF" is not a supported file format, assuming this is the TIFF image format. Make sure to exclude this from your extensions list in Zoom.
Anonymous
04-21-2005, 05:25 AM
I think I got it nailed, certain folders named with a starting underscore, __Scans, were not being index. I renamed the folders to Scans and they are now being indexed. Regardless of how many files are being index, the summary at the end seems to be incorrect. Is this normal?
Can you give us some indication of how or when it is incorrect? For example, a screenshot to show us how the Status tab is inconsistent with the Summary at the end?
One thing to note is that if you have "Scan files with no extensions" enabled, these files would not be counted in your extensions list (eg. ".html files scanned") and thus it may not add up if you look at it that way. In the upcoming version, there will be an extra counter for "files with no extensions" to make this more obvious.
Also note that, even if a file is technically stored on disk as "index.html", but if the spider finds it via a URL such as "http://mysite.com/", it would still be counted as a file with no extension.
Anonymous
04-26-2005, 09:40 PM
One thing to note is that if you have "Scan files with no extensions" enabled, these files would not be counted in your extensions list (eg. ".html files scanned") and thus it may not add up if you look at it that way. In the upcoming version, there will be an extra counter for "files with no extensions" to make this more obvious.
You hit the nail right on the head. Everything seems to be working great now! I can't believe the power of this little program. Sure beats spending $2K US on a google mini. My boss loves it. You guys are great! :lol:
Powered by vBulletin® Version 4.1.12 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.