PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

file size and word limit

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • file size and word limit

    Hi,
    I have tried to find an answer to this but no luck. apologies if it has already been answered.
    I am trying to index some local directories. I am finding that the zoom_pagetext.zdat is massive (over a gig many times ) compared to the amount of data being searched.
    I have already added in the most common words to be excluded but it maxes out at 500,000 words pretty quickly.

    for a lot of the files i am trying to index, there may be large amounts of numerals - could this be impacting on it and is there a way to exclude numbers?

    so far a great tool but this is starting to limit me.
    cheers
    john

  • #2
    We found that when indexing HTML pages & with a few images, the index was around 10% of the size of the source data. Details can be found here. But the ratio will vary depending on the source files (sometimes by a lot).

    There is no real problem with the zoom_pagetext being large.

    You can have a look in the zoom_dictionary.zdat file if you want to see the words found in your files. There are only around 50,000 words in common use in the English language. So to get to 500,000 you must have a lot of numbers, or names, or something.

    You can also upgrade to the enterprise edition. Where there is no fixed limit on the number of words.

    For skipping numbers see this old post.

    Comment


    • #3
      Hi David,
      we have the enterprise edition. i tried the *1 etc but it still seems to be including them in the dictionary

      Comment


      • #4
        If you have the enterprise edition, just go to the limits configuration window and change the word limit from 500,000 to 1,000,000. Problem solved

        Comment

        Working...
        X