Feature Request: Ability to Tweak Stemming
I'm really liking the stemming function in V6. But I've found a couple of circumstances where it's giving me undesirable results.
For example, my website has both RA and RAS as abbreviations. It finds pages with either of them when either is searched for whereas I'd prefer if it only found pages with the one that was searched for.
Another example, I have the word device and the proper name Devic in my site. Because of stemming (I assume), searching for "devic" (no quotes) gives me all the pages that have the word device too.
If feasible, it would be nice if there could be a page akin to synonyms where one could enter/upload stemming exceptions. If I could add specific things to a list that would then not be interpreted as a stem, it would solve this problem.
Related to this feature request -- to tweak stemming -- is my desire to tweak Synonyms. Misspellings and synonyms are entered in the same Zoom Search Engine entry box. For example, I have entered these variants in Zoom Search Engine:
raisin = raison,rasian,rasain,rasan
If I do a search for raisian, I get: "Did you mean: rasian?"
If I do a search for rason, I get: "Did you mean: raisins or raison or rasain?"
The suggested word must never be a misspelling! It should only suggest the correct spelling of raisin. The Synonyms module seems to think rasian, rasain and raison are a properly spelled synonyms for raisin. We need to be able to make a distinction between misspellings and correctly spelled synonyms.
Good points above.
Yes, that is a result of stemming, and one of the downsides to using it. A list of words to exempt from stemming is a good idea, and likely something we could add for a V6.1 release or similar. We're trying to find a balance between having an overwhelming number of features that the majority of users don't know what to do with, and having things "just work". Most people wouldn't understand that this behaviour was caused by stemming, so they wouldn't even look for a stemming exemption list to enter words in.
Originally Posted by aschecht
It would be nice if we could automatically determine this based on upper/lower-casing, but that also makes a whole bunch of assumptions about how a user will type in names and abbreviations. We'll look into it in any case.
Yes, that is right, and yes, there are also synonyms which are correctly spelled, so we don't want to remove them from the suggestions all together.
Originally Posted by rschletty
In your particular example, you probably shouldn't need to have "raison", "rasain", or "rasan" as synonyms, because they are all automatically determined as spelling mistakes of "raisin".
Note that the Spelling Suggestions feature automatically associate similar words (based on the phonetic sound of the word), so you should not need to enter so many misspelt words as synonyms, except maybe for a smaller group of words which could not be automatically associated. I think this should minimize the problem at least.
I think you need three columns in the Zoom Synonyms custom entry tool: correct word, synonyms and misspellings.
Then, the module would pull suggestions only from correct words, synonyms, phonetic library and stemming library -- but never from misspellings.
I understand that raison is French for reason, but I don't think that's why it presented that word as a suggested option. And I was pretty sure that I tested variants of raisin before entering my misspellings. I'll take out my entries and test again.
Thanks for taking this feature request into consideration. I agree that most users would never touch it but it would be a nice piece of fine tuning for users who wanted to take the time to add finishing touches.
I like rschletty's suggestion about the Synonym tweaking as well. Many of the terms on my synonym lists are misspellings of proper names and it's suboptimal when they show up on the Did you mean . . . list. I like the idea of three columns:
key term | synonyms | misspellings
where only the synonym matches would be eligible to appear as Did you mean . . . suggestions.
I populate my Synonyms list with search terms that went unfound as determined by reviewing my search logs. This means that I don't both making synonyms for misspelled words unless they result in no hits.
+1 vote on improved support for stemming/synonyms.
I was surprised to find that I can't use phrases for synonyms. How then can I create synonyms for acronyms? For example, how could I create a synonym for something like PTO=Paid Time Off ??
That doesn't appear to be possible
I agree with Dan. I was disappointed to see I could not enter a phrase in Synonyms.
May I add my voice to a request for an ability to tweak stemming?
I like the way it works most of the time, but occassionally English can be a real pain - "news" is not the plural of "new" (nor "new" the singular of "news").
We're thinking of adding an user specifiable list of words to exempt from stemming. We'll note your request. Thanks for the input.