PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Search "words" - How to exclude common search word

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search "words" - How to exclude common search word

    I am indexing a site that is genealogy based. Many of the words are common on many of my pages like, birth, marriage, death, etc.

    Is there a way to display the words/phrases that are being indexed, so that I can add them to an exclusion list.

    and if so how.

    Thanks
    Carlton Brooks
    Carlton Brooks
    Mesa, Arizona

  • #2
    The words being indexed are the words that appear on your web pages. So by looking at your web pages you'll know the words and phrases.

    Otherwise you can also find a list in the zoom_dictionary.zdat file. But MAKE SURE you don't edit this file, or convert this file, or save it in a different format.

    ---------
    David

    Comment


    • #3
      Contains 789780
      only 789790
      basic 789808
      birth 789818
      marriage 800220
      death 807182
      burial 816648
      Excludes 818570
      source 818580
      This is a small section of the zoom_dictionary.zdat file

      Can you explain what I am seeing. Why do some words have a -1 behind them.
      Carlton Brooks
      Mesa, Arizona

      Comment


      • #4
        These are words that have been indexed. The number after the word is a pointer into the rest of the index files.

        -1 usually means a word is not searchable but can still appear in the context results. This can happen with word case variants, skipped words and a few other cases.

        I really don't think you are going to get very far searching through the internal index files. There is a lot of compressed binary information designed to be read by a machine and not a human.

        What is it eactly that you are trying to work out?

        -------
        David
        Wrensoft

        Comment


        • #5
          I am trying to determine what words I can add to my exclude list. Many of the words in my sight are repeats on almost every page, ie: birth, death, marriage.

          But if they were to search my name Carlton, they get over 8,000 hits. I want to find out in what combination my name is being used, to cut down on this information. My name appears on almost every page as a header, footer, in a source note, etc.

          After using a program for many years called Ksearch, I had to eliminate many seraches because the search file became over 6 megs and the browsers could not handle it.

          Even with Zoom, your handling of the data is different and there is no bogdown of the browser, but the hits that a person gets are huge.

          Thanks
          Carlton Brooks
          Mesa, Arizona

          Comment


          • #6
            You shouldn't need to look at the index files to determine what you need to exclude. It would be much better to look at the actual site, eg. the words that appear repeated on every page, etc.

            However, we have a better way to deal with this. You can use the and tags to specify portions of a page which should be excluded from indexing. Typically, this would be used around headers, footers, navigation menus, etc. - any elements which are repeated on every page, and thus have very little use for searching. For more information, see chapter 6.5 of our users guide:
            http://www.wrensoft.com/zoom/usersguide.html

            This way, instead of skipping common words (and making it not possible to search for them across your site at all), you would instead be able to search for say, "Carlton" without having hits from every page that contains your name in the footer.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Thanks for the info it sounds great.
              Carlton Brooks
              Mesa, Arizona

              Comment

              Working...
              X