Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Excluding numeric codes in indexing

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Excluding numeric codes in indexing

    Is it possible to configure Zoom (I'm using Zoom Pro) so that it does not index "words" with numerals in them?

    Some of the documents I'm indexing are lists of names with codes next to them, the codes consisting of numerals and letters. For instance,

    Howard, J. R 2b 7568

    Although the code is meaningful, there is no point in indexing it since many items in the list will also have the same code, so it doesn't help in finding the entry required.

    I'm trying here to reduce the size of the index, since with these lists of personal names there are very many words (= names) to be added to the index.

  • #2
    Oops - you've already answered this!

    So sorry - I've just found the old post where this question was raised and answered. (http://www.wrensoft.com/forum/showthread.php?t=669)
    I'll experiment with the suggestion put forward there.

    Comment


    • #3
      It's worth pointing out that this also does not reduce the unique words count. It will, however, save index space (which makes for a speedier search) because we don't store any additional data for it. But the number itself is still stored in the dictionary because we need it for reconstructing the context description.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thank you, Ray, that's good. I was having problems with the number of unique words until I decided to exclude the 50% of documents that had been processed using OCR and restrict the indexing to those documents which were purely computer-generated. But my concern now is to minimise the size of the index to enable the fastest searches. So thank you for your confirmation.

        Comment


        • #5
          Make sure you are using the CGI version if speed is of the essence.

          See the benchmarks page here for some idea of the speed differences between platforms:
          http://www.wrensoft.com/zoom/benchmarks.html
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X