PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Chinese Language Search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Chinese Language Search

    I’m looking for some help in getting Zoom Search to work on a Chinese language website. The search works great on the English section of the website, but I haven’t had any success in getting it functional on the Chinese section. I’m guessing it’s a coding issue, but I think tried all the possibilities (both utf-8 and Unicode). I’m using the PHP 5.1 version of Zoom. Is there someone out there using PHP to search a Chinese website that can give me some pointers? If you would like to look at the web site it can be found at: http://www.kongandallan.com/cn_index.html
    The Chinese language search boxes can be found in the bottom 3 menu button selections in the right hand side of the main page (or directly at: http://www.kongandallan.com/cn_casestudies.html ) At present the search is using the pure generic search template. Thanks!

  • #2
    Are you using Offline Mode or Spider Mode?

    When I tried to index your site using Spider Mode, I discovered that your server is responding in UTF-16 (double byte). This is pretty rare (most web servers use UTF-8 for Unicode or various single byte encoding). The current version of Zoom does not handle UTF-16 response in Spider Mode, but we should be able to add this into the next build (V5.1 build 1012).

    On the other hand, your webpages contain a meta tag specifying the GB2312 charset, so this was pretty confusing.

    The behaviour may be different in Offline Mode, as I can't be certain how your files are stored on disk (they could be in UTF-16 on disk as well, or they could be GB2312 on disk and converted to UTF-16 by the web server).

    In any case, if you can change your web server to respond in UTF-8, this would get around the problem. Otherwise, you can wait for the next build or e-mail us for a test copy in the meantime.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Chinese language search

      Thanks for the quick response! I'm using the off-line mode, but I've also tried the spider mode with the same result. The pages were originally all defined as UTF-8. (files saved as utf-8 and meta tag definitions). When I wasn't able to get it working in that format I changed the meta tag to GB2312 because my Chinese contact said I need to make that change (this is my first experience with a multi language site - so I have lots to learn). That change didn't have any effect. After more reading on the subject I was lead to think I needed to save all the files as UTF-16. Still no luck. I'm guessing the web server is responding in UTF-16 because that is how the files where formated when I uploaded them. Should I set everything back to UTF-8 and try again? Or, does something need to be changed on the web server?
      I appreciate you help!

      Comment


      • #4
        UTF-16 is rarely used on the web. This is because it wastes alot of space, you are essentially doubling up the filesize for all alphanumeric characters (which even Chinese pages will contain due to the HTML markup).

        I think the behaviour should be different when you index with UTF-8 than UTF-16. Take note of these things:

        - How many files are indexed
        - Check that you have the same encoding selected in the Zoom Configuration window as the charset you are using for the webpages.
        - Check that your search_template.html is in the same encoding
        - Turn on "Support single case languages" and "Substring match for all searches" on the "Languages" tab

        See this page for more information.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Chinese search

          I reformatted all the Chinese pages in UTF-8 and now everything works perfectly! I'm guessing I missed setting the search_template.html to utf-8 my first time around. Thanks again for your help!

          Comment

          Working...
          X