PDA

View Full Version : Excluding numeric codes in indexing



studious
08-15-2010, 05:48 PM
Is it possible to configure Zoom (I'm using Zoom Pro) so that it does not index "words" with numerals in them?

Some of the documents I'm indexing are lists of names with codes next to them, the codes consisting of numerals and letters. For instance,

Howard, J. R 2b 7568

Although the code is meaningful, there is no point in indexing it since many items in the list will also have the same code, so it doesn't help in finding the entry required.

I'm trying here to reduce the size of the index, since with these lists of personal names there are very many words (= names) to be added to the index.

studious
08-15-2010, 06:43 PM
So sorry - I've just found the old post where this question was raised and answered. (http://www.wrensoft.com/forum/showthread.php?t=669)
I'll experiment with the suggestion put forward there.

Ray
08-16-2010, 12:42 AM
It's worth pointing out that this also does not reduce the unique words count. It will, however, save index space (which makes for a speedier search) because we don't store any additional data for it. But the number itself is still stored in the dictionary because we need it for reconstructing the context description.

studious
08-20-2010, 12:23 AM
Thank you, Ray, that's good. I was having problems with the number of unique words until I decided to exclude the 50% of documents that had been processed using OCR and restrict the indexing to those documents which were purely computer-generated. But my concern now is to minimise the size of the index to enable the fastest searches. So thank you for your confirmation.

Ray
08-20-2010, 12:58 AM
Make sure you are using the CGI version if speed is of the essence.

See the benchmarks page here for some idea of the speed differences between platforms:
http://www.wrensoft.com/zoom/benchmarks.html