PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

What harm in *not* choosing single case for Asian languages (ZH, JA)?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What harm in *not* choosing single case for Asian languages (ZH, JA)?

    I have a fair number of English keywords in Chinese and Japanese documents.
    Users could use many mixes of case to search English, so I'd like to allow this.
    If I choose not to select single case only, what harm is done to the Chinese (or Japanese) indexing ?
    (see also http://www.wrensoft.com/forum/showthread.php?t=3373)

    Peter

  • #2
    This should only be used if you are using a language where there is no case-difference and problems can occur when the script or indexer attempts to convert case (such as some East Asian languages).

    If you use this option for English then you end up with a case sensitive search. For example if the word 'Elephant' appeared in the document, searching for 'elephant' would not result in a match.

    If you don't use this option for some Asian languages then the case conversion process can mess up some characters for some of the scripting options. The problem is really due to some scripting languages (ASP, PHP, JS) not providing good support for Asian languages. This will in turn mean that some Asian words won't be searchable. Unfortunately I don't have a list of characters in each language that are safe with each scripting option.

    Comment


    • #3
      If there is a limited number of English keywords (presumably your product names, or company names, acronyms etc.), you might consider creating synonyms for them. e.g. IBM=ibm,Ibm
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment

      Working...
      X