Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Adding Special Characters

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding Special Characters

    Is it possible to add additional special characters or ligatures to a search engine? Something like appending values to $AccentChars and $NormalChars in settings.php.

    My client is indexing pdf files that contain transcriptions of 17th century Italian. The search seems to be working well using UTF-8 with "enable accent/diacritic insensitivity..." enabled however some of the glyphs or ligatures found in the documents are not contained in the above mentioned variables so if a user were to spell an indexed word using "normal" characters they may not get the same results as when they use special characters. Adding words with their normalized spelling to the synonyms list does work but the authors would have to produce a list of hundreds of normalized words. This is not possible. We need to have that automatic accent/diacritic insensitivity.

    A good example of a special character not included in the above variables would be the long s. We do not get the same search results when a user enters a word with a long s or with a normalized regular s.

  • #2
    Appending values to $AccentChars and $NormalChars will not work -- this only affects the search script. You will need the changes to be made within the Indexer too.

    Making such changes to the Indexer would require custom development. This is the first time we've had such a request so it's not something that anybody else has asked for yet. If you are interested in a quote for custom development, you can provide us with a comprehensive list of the characters you need (and what they should be mapped to for normal characters). Probably best to e-mail us with more details.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X