Results 1 to 5 of 5

Thread: Excluding numeric codes in indexing

  1. #1
    Join Date
    Jun 2010
    Posts
    10

    Default Excluding numeric codes in indexing

    Is it possible to configure Zoom (I'm using Zoom Pro) so that it does not index "words" with numerals in them?

    Some of the documents I'm indexing are lists of names with codes next to them, the codes consisting of numerals and letters. For instance,

    Howard, J. R 2b 7568

    Although the code is meaningful, there is no point in indexing it since many items in the list will also have the same code, so it doesn't help in finding the entry required.

    I'm trying here to reduce the size of the index, since with these lists of personal names there are very many words (= names) to be added to the index.

  2. #2
    Join Date
    Jun 2010
    Posts
    10

    Default Oops - you've already answered this!

    So sorry - I've just found the old post where this question was raised and answered. (http://www.wrensoft.com/forum/showthread.php?t=669)
    I'll experiment with the suggestion put forward there.

  3. #3
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,571

    Default

    It's worth pointing out that this also does not reduce the unique words count. It will, however, save index space (which makes for a speedier search) because we don't store any additional data for it. But the number itself is still stored in the dictionary because we need it for reconstructing the context description.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

  4. #4
    Join Date
    Jun 2010
    Posts
    10

    Default

    Thank you, Ray, that's good. I was having problems with the number of unique words until I decided to exclude the 50% of documents that had been processed using OCR and restrict the indexing to those documents which were purely computer-generated. But my concern now is to minimise the size of the index to enable the fastest searches. So thank you for your confirmation.

  5. #5
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,571

    Default

    Make sure you are using the CGI version if speed is of the essence.

    See the benchmarks page here for some idea of the speed differences between platforms:
    http://www.wrensoft.com/zoom/benchmarks.html
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •