Results 1 to 7 of 7

Thread: Problems with macrons and Chinese characters

  1. #1
    Join Date
    Jan 2009
    Posts
    14

    Default Problems with macrons and Chinese characters

    I've got V6 trial working, but macrons and Chinese characters are not supported. I've read the International Language Support page but I'm still unable to get things to work. When I enter a special character into the search box, it's converted into a question mark.

    All our pages have charset UTF-8 (including the search page) and I've put the indexer into Unicode mode.

    All characters are entered via Unicode - ie. 欢迎 is &#27426 ;&#36814 ; - I'm not sure if this makes a difference.

    Anyone have any suggestions?
    Last edited by jellybean; 01-13-2009 at 08:32 PM.

  2. #2
    Join Date
    Dec 2004
    Location
    Sydney
    Posts
    4,156

    Default

    Chinese characters are supported when using UTF-8.
    What is a the URL of your search fucntion so we can take a look at the problem?

  3. #3
    Join Date
    Jan 2009
    Posts
    14

    Default

    Sorry! It's on an internal server without outside access
    I can provide screenshots though... would that help?

  4. #4
    Join Date
    Jan 2009
    Posts
    14

    Default

    I can describe what happens here...

    One of our pages has the phrase:

    He aha mahi i tnei rā?

    Here is what is returned in the search results when I provide the keyword t?nei, the search results showed:

    He aha ö mahi i tënei rā?

    All the special characters have been garbled. It looks like it's being encoded funny when it's indexed.

    I hope this is more descriptive. Sorry I can't provide a link

    edit: When I search for tenei, it returns the correct result, but still with the garbled special characters. When I search for 欢迎, which appears on one of our pages, it is somehow reduced to ?? and all results come back. In the URI, the zoom_query parameter is still 欢迎. This is the same as for macron characters (umlauts are OK since I think they fall in the standard ISO Latin character set).

  5. #5
    Join Date
    Dec 2004
    Location
    Sydney
    Posts
    4,156

    Default

    If the site is internal and can't be accessed, can you send us,
    1) The Zoom configuration file you are using (called something like, xxxxxxx.zcfg)
    2) The HTML code from the page you are indexing that contain the chinese characters
    3) Your search_template.html file

    Also can you tell us what type of server you are using (Windows and Linux) and if you kow it what language is the default for the server. e.g. is it the chinese version of Windows, or the English version.

  6. #6
    Join Date
    Jan 2009
    Posts
    14

    Default Works now! Except for one niggly thing...

    Thanks so much for your help. The new version handles Chinese characters perfectly.

    The one thing that remains... I notice that even if I select "enable accent/diacritic insensitivity" for all three (accents, umlauts and ligatures) it doesn't do the same for macrons.

    Would that be possible to add macron insensitivity?

  7. #7
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,573

    Default

    We'll add that to our list of things to consider for a future release. We're not really familiar with macrons at this point (the tricky thing with what we do is we have to look into how it might affect more than one language), so we would need some time to research before we decide to add it or not. But thanks for the suggestion.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •