View Full Version : Search "words" - How to exclude common search word
carltonb
02-21-2005, 05:41 PM
I am indexing a site that is genealogy based. Many of the words are common on many of my pages like, birth, marriage, death, etc.
Is there a way to display the words/phrases that are being indexed, so that I can add them to an exclusion list.
and if so how.
Thanks
Carlton Brooks
wrensoft
02-21-2005, 06:59 PM
The words being indexed are the words that appear on your web pages. So by looking at your web pages you'll know the words and phrases.
Otherwise you can also find a list in the zoom_dictionary.zdat file. But MAKE SURE you don't edit this file, or convert this file, or save it in a different format.
---------
David
carltonb
02-21-2005, 10:00 PM
Contains 789780
only 789790
basic 789808
birth 789818
marriage 800220
death 807182
burial 816648
Excludes 818570
source 818580
This is a small section of the zoom_dictionary.zdat file
Can you explain what I am seeing. Why do some words have a -1 behind them.
wrensoft
02-21-2005, 10:40 PM
These are words that have been indexed. The number after the word is a pointer into the rest of the index files.
-1 usually means a word is not searchable but can still appear in the context results. This can happen with word case variants, skipped words and a few other cases.
I really don't think you are going to get very far searching through the internal index files. There is a lot of compressed binary information designed to be read by a machine and not a human.
What is it eactly that you are trying to work out?
-------
David
Wrensoft
carltonb
02-22-2005, 10:45 PM
I am trying to determine what words I can add to my exclude list. Many of the words in my sight are repeats on almost every page, ie: birth, death, marriage.
But if they were to search my name Carlton, they get over 8,000 hits. I want to find out in what combination my name is being used, to cut down on this information. My name appears on almost every page as a header, footer, in a source note, etc.
After using a program for many years called Ksearch, I had to eliminate many seraches because the search file became over 6 megs and the browsers could not handle it.
Even with Zoom, your handling of the data is different and there is no bogdown of the browser, but the hits that a person gets are huge.
Thanks
You shouldn't need to look at the index files to determine what you need to exclude. It would be much better to look at the actual site, eg. the words that appear repeated on every page, etc.
However, we have a better way to deal with this. You can use the and tags to specify portions of a page which should be excluded from indexing. Typically, this would be used around headers, footers, navigation menus, etc. - any elements which are repeated on every page, and thus have very little use for searching. For more information, see chapter 6.5 of our users guide:
http://www.wrensoft.com/zoom/usersguide.html
This way, instead of skipping common words (and making it not possible to search for them across your site at all), you would instead be able to search for say, "Carlton" without having hits from every page that contains your name in the footer.
carltonb
02-24-2005, 11:12 PM
Thanks for the info it sounds great.
Powered by vBulletin® Version 4.1.12 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.