PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Problems with macrons and Chinese characters

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with macrons and Chinese characters

    I've got V6 trial working, but macrons and Chinese characters are not supported. I've read the International Language Support page but I'm still unable to get things to work. When I enter a special character into the search box, it's converted into a question mark.

    All our pages have charset UTF-8 (including the search page) and I've put the indexer into Unicode mode.

    All characters are entered via Unicode - ie. 欢迎 is &#27426 ;&#36814 ; - I'm not sure if this makes a difference.

    Anyone have any suggestions?
    Last edited by jellybean; Jan-13-2009, 09:32 PM.

  • #2
    Chinese characters are supported when using UTF-8.
    What is a the URL of your search fucntion so we can take a look at the problem?

    Comment


    • #3
      Sorry! It's on an internal server without outside access
      I can provide screenshots though... would that help?

      Comment


      • #4
        I can describe what happens here...

        One of our pages has the phrase:

        He aha mahi i tnei rā?

        Here is what is returned in the search results when I provide the keyword t?nei, the search results showed:

        He aha ö mahi i tënei rā?

        All the special characters have been garbled. It looks like it's being encoded funny when it's indexed.

        I hope this is more descriptive. Sorry I can't provide a link

        edit: When I search for tenei, it returns the correct result, but still with the garbled special characters. When I search for 欢迎, which appears on one of our pages, it is somehow reduced to ?? and all results come back. In the URI, the zoom_query parameter is still 欢迎. This is the same as for macron characters (umlauts are OK since I think they fall in the standard ISO Latin character set).

        Comment


        • #5
          If the site is internal and can't be accessed, can you send us,
          1) The Zoom configuration file you are using (called something like, xxxxxxx.zcfg)
          2) The HTML code from the page you are indexing that contain the chinese characters
          3) Your search_template.html file

          Also can you tell us what type of server you are using (Windows and Linux) and if you kow it what language is the default for the server. e.g. is it the chinese version of Windows, or the English version.

          Comment


          • #6
            Works now! Except for one niggly thing...

            Thanks so much for your help. The new version handles Chinese characters perfectly.

            The one thing that remains... I notice that even if I select "enable accent/diacritic insensitivity" for all three (accents, umlauts and ligatures) it doesn't do the same for macrons.

            Would that be possible to add macron insensitivity?

            Comment


            • #7
              We'll add that to our list of things to consider for a future release. We're not really familiar with macrons at this point (the tricky thing with what we do is we have to look into how it might affect more than one language), so we would need some time to research before we decide to add it or not. But thanks for the suggestion.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X