Stemming now added to V6!
We can now confirm that V6 will feature STEMMING.
This is a much requested feature, that when enabled, search results will match similar words or words which are derivatives of each other (e.g. plurals). For example, searching for the word "fish" will return pages containing the singular and plural words variates "fish", "fishes", "fishing", etc.
Adding this feature required some significant changes to the index file format and the way we index and search words, but we are glad to see that the end results seem to be worth the effort.
The feature will be enabled by default in V6. But you may want to turn it off, if for example, it is absolutely critical that your website differentiates between "booking", "booker", "book", etc.
More information on V6 here.
Stemming and single-case languages
I notice that stemming is disabled when "support for single-case languages (ie asian)" is enabled.
Is this intentional? I can't use both?
The stemming algorithm is very language dependent. It doesn't make sense for most asian languages where there are no linguistic concepts such as plurals or verbs.
I would like to know which the 16 languages are that stemming works for. Couldn't find it in the article. I think it's a great feature.
They are listed on the languages window in the Zoom configuration. (You need to select the CGI script option first however).
Hi, please tell us, and Russian language support v6?
Russian is supported with a few minor exceptions. See,
For Russian stemming you need to use the CGI option.
Does that mean that the stemming function does not work for Chinese as well?
There is no stemming functionality for Chinese. Linguistically I don't see how that would work either. There is no plural or singular forms of words, nor is there present and past tense in the Chinese language and most asian languages that we are aware of.
Tags for this Thread