I'm using the professional 4.2 version of zoom.
I have indexate some pages that contains the words 'directive' and 'directives'.
I'm surprised that a research with the keyword 'directive' does'nt found the pages that contains only the term 'directives'.
I know that there is some alternative like using '?' ('directive?' -> good results in any case), like using '*' (directive*' -> all page are found but the pages with the two terms 'directives' and 'directive' are classified before a page with only one term but with better score) or like configuring the 'substring matches for all searches' option (same results than '*').
I think that an enduser will research the singular or plural version of a keyword and that it is not a reflex for him to use each time '?' at the end of his keywords.
Have you a solution for your 4.2 version or have you planed an eventual solution for your future version?
Independantly of this problem, your software is very good.
A labor-intensive workaround is to use the synonym list. If there are only a few terms you desire a search to find both singular and plural, add them to the synonyms in the Zoom configuration page. One bug with this method is that neither the highlighting nor the jump to highlight feature works. Pages containing the alternate form of the word are returned in the search list, however the hits are not highlighted.
There is no perfect solution in this version of the software. As pointed out, synonyms can be added for your key words. Another way is to add all word variants in the meta data for the page.
There is no easy technical solution & even search engines like Google don't do it. Try a search for 'directive' on Google.
There are several technical problems preventing an easy solution. These are mostly the result of crazy grammatical features in the English language.
Some examples from Damian Conway's page on the subject,
Building all this into a script would need a lot of code to cover the rules, exceptions and many one off cases. We would probably need to include a full English dictionary into the scripts.Nouns that end in -ss universally become -sses in the plural (and vice versa for verbs). Likewise, nouns which end in a consonant followed by -y almost always become -ies in the plural
Certain types of adjectives also inflect in the plural. For example, possessive adjectives that end in -'s or -' in the singular, are made plural by forming the plural of the root word and appending an apostrophe (unless the root's plural does not itself end in -s, in which case -'s is appended). Hence cat's becomes cats', mantis' becomes mantises', whilst child's becomes children's.
Other suffix categories arise because words of foreign origin (most commonly Ancient Greek or Latin) have retained a non-anglicized plural inflection. Hence criterion becomes criteria, nucleus becomes nuclei, and matrix becomes matrices
The correct inflection of words derived from Latin can be particularly complex, since the same suffix may form different Latinate plurals depending on the declension (or sometimes the part of speech) of the original. Thus the plural of stimulus (second declension) is stimuli, and that of genus (third declension) is genera. Status (fourth declension) is traditionally unchanged in the plural, whilst ignoramus (a first person plural Latin verb) has been wholly anglicized and becomes ignoramuses.
And as our search engine provides support for many languages, (French, German, Italian, Danish, Croatian, Swedish, Norwegian, Spanish, Dutch, Japanese, Russian, and Portuguese) it would surely do the wrong thing when faced with any non English text and return totally incorrect results.
I'm aggreing with you about the difficulties (for example with child's or children's).
I think that the basic of the problem is not to do a research with all the plural words of the dictionnary but only to understand the behaviour of the end user.
When a plural word is equivalent to add a 's" to the word, the end user has not the reflex to do a research with the sigular word and the plural word because for him it is the same word.
When the plural version of the word is very different of the singular word, he can have the reflex to do more that one research.
Maybe you can add in your future zoom version a 'plural parameter option' that takes into account one letter for most of the words (like s).
In this case, you can do a research with the basic words of the forms and with the same words with 's' at the end of each word and count one term watched for the word with or without 's' (for exemple man > research about man and mans and not men, the result page will not found mans no problem // research with cat > you research about cat or cats but consider in your page result that cat and cats are the same term)
With this option, you can maybe cover 80% of the research for the singular / plural word of a lot of languages. For the other 20%, the end-user will do probably a difference when a plural word is more complicated... (for example used the words man and men)
As David mentioned, there are other languages we support where the above does not apply. In addition to this, there are a great number of words where a simple additional "s" would not work, and I think the percentage that this would help is much less than 80% if you consider this and the number of words like "searches", "ferry"/"ferries", etc.
The major disadvantage in your suggestion is that it will return incorrect results. For example, if I'm looking for the acronym "ASP", it will have to also return matches for "ASPS", which could be a totally different acronym or name. Also, there is the extra processing time wasted to look for each word with an additional 's', even when the user does not need to do so (eg. when searching for a name).
We would like to implement something to address this issue sometime in the future, but we would be interested in something that provides a more reliable set of results, than implement something which causes problems elsewhere.
As mentioned before, synonyms and meta words can help. Also, adding tips on your search page to encourage the use of wildcards are some of the currently available solutions at the moment.
Wrensoft Web Software
Zoom Search Engine
Any thoughts on how I could get "San Francisco" to equal SFO?
Synonyms would be a solution, but you can't have a word set to equal a phrase. You can only do word to word.
So what if you made a synonym of
SFO = Francisco
It wouldn't be perfect but would give a right result a lot of time.
The other option is to add both terms in the page meta data.
Is there any performance penalty involved in using a large number of synonyms? (For the sake of argument, let's say I'm using 500.)
There is no significant performance penalty in using a long list of synonyms.
The exception might be if you actually have more synonyms than the number of words indexed, in which case it would obviously be somewhat more noticable. Pretty unusual case, but we often get users who are doing pretty unusual things
Update Sept 2008: We have added Stemming support for the upcoming V6 release which will help address the above issue. Please see this thread for information.
Wrensoft Web Software
Zoom Search Engine