Excluding numeric codes in indexing
Is it possible to configure Zoom (I'm using Zoom Pro) so that it does not index "words" with numerals in them?
Some of the documents I'm indexing are lists of names with codes next to them, the codes consisting of numerals and letters. For instance,
Howard, J. R 2b 7568
Although the code is meaningful, there is no point in indexing it since many items in the list will also have the same code, so it doesn't help in finding the entry required.
I'm trying here to reduce the size of the index, since with these lists of personal names there are very many words (= names) to be added to the index.
Oops - you've already answered this!
So sorry - I've just found the old post where this question was raised and answered. (http://www.wrensoft.com/forum/showthread.php?t=669)
I'll experiment with the suggestion put forward there.
It's worth pointing out that this also does not reduce the unique words count. It will, however, save index space (which makes for a speedier search) because we don't store any additional data for it. But the number itself is still stored in the dictionary because we need it for reconstructing the context description.
Thank you, Ray, that's good. I was having problems with the number of unique words until I decided to exclude the 50% of documents that had been processed using OCR and restrict the indexing to those documents which were purely computer-generated. But my concern now is to minimise the size of the index to enable the fastest searches. So thank you for your confirmation.
Make sure you are using the CGI version if speed is of the essence.
See the benchmarks page here for some idea of the speed differences between platforms: