View Full Version : Problems with macrons and Chinese characters
jellybean
01-13-2009, 08:20 PM
I've got V6 trial working, but macrons and Chinese characters are not supported. I've read the International Language Support page but I'm still unable to get things to work. When I enter a special character into the search box, it's converted into a question mark.
All our pages have charset UTF-8 (including the search page) and I've put the indexer into Unicode mode.
All characters are entered via Unicode - ie. 欢迎 is 欢 ;迎 ; - I'm not sure if this makes a difference.
Anyone have any suggestions?
wrensoft
01-13-2009, 08:28 PM
Chinese characters are supported when using UTF-8.
What is a the URL of your search fucntion so we can take a look at the problem?
jellybean
01-13-2009, 08:31 PM
Sorry! It's on an internal server without outside access :(
I can provide screenshots though... would that help?
jellybean
01-13-2009, 08:42 PM
I can describe what happens here...
One of our pages has the phrase:
He aha mahi i tnei rā?
Here is what is returned in the search results when I provide the keyword t?nei, the search results showed:
He aha ö mahi i tënei rā?
All the special characters have been garbled. It looks like it's being encoded funny when it's indexed.
I hope this is more descriptive. Sorry I can't provide a link :(
edit: When I search for tenei, it returns the correct result, but still with the garbled special characters. When I search for 欢迎, which appears on one of our pages, it is somehow reduced to ?? and all results come back. In the URI, the zoom_query parameter is still 欢迎. This is the same as for macron characters (umlauts are OK since I think they fall in the standard ISO Latin character set).
wrensoft
01-13-2009, 09:44 PM
If the site is internal and can't be accessed, can you send us (http://www.wrensoft.com/contactus.html),
1) The Zoom configuration file you are using (called something like, xxxxxxx.zcfg)
2) The HTML code from the page you are indexing that contain the chinese characters
3) Your search_template.html file
Also can you tell us what type of server you are using (Windows and Linux) and if you kow it what language is the default for the server. e.g. is it the chinese version of Windows, or the English version.
jellybean
01-19-2009, 08:01 PM
Thanks so much for your help. The new version handles Chinese characters perfectly.
The one thing that remains... I notice that even if I select "enable accent/diacritic insensitivity" for all three (accents, umlauts and ligatures) it doesn't do the same for macrons.
Would that be possible to add macron insensitivity?
We'll add that to our list of things to consider for a future release. We're not really familiar with macrons at this point (the tricky thing with what we do is we have to look into how it might affect more than one language), so we would need some time to research before we decide to add it or not. But thanks for the suggestion.
Powered by vBulletin® Version 4.1.12 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.