PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Question on indexing Japanese language site

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on indexing Japanese language site

    I currently run an english based web site that has been using Zoom Search for a while with great succes. The nature of the content on my site has lead me to start developing a Japanese sister site for it and I wish to continue using Zoom Search, looking around on the site I do see Japanese is a supported language but I was concerned with this statement:


    If your website is encoded in UTF-8, Zoom will successfully index your site, and will be capable of performing searches. However, search performance and accuracy is limited, as Zoom will only split words by
    • Formatting (spaces between words, or paragraphs, etc.)
    • Change of character type (from hiragana to katakana, etc.)

    This means that an entire sentence may be indexed as a "word". However, if you enable "Substring match for all searches" on the "Languages" tab of the Configuration window, then searches which appear within a sentence will match correctly.
    Zoom does not currently support indexing Shift-JIS pages. You will have to convert your website to UTF-8 if you wish to use it with Zoom.
    Some words in the Japanese language ustilize both Kanji and hiragana to write out the single word and my concern is this would cause searches to fail. For example the word "split" is written with one Kanji and one hiragana. If zoom is splitting on the new hiragana after the single kanji and indexing it as two words what would happen when someone searches for the kanji and hiragana string?

  • #2
    I think it should still work provided that the same split will be done when the search string is entered.

    Do you have an example where it is not working. (Note: that as mentioned in the quote above Japanese language searches are never going to be as accurate as searches in English and other latin based languages).

    Comment

    Working...
    X