When building the index, the indexer must decide how to split a sentence up into words. Any characters in the following range are considered to be part of a word.
• | Lower case characters, ‘a’ to ‘z’ |
• | Upper case characters, ‘A’ to ‘Z’ |
• | Foreign characters, ‘À’ to ‘ÿ’ |
• | A join character (defined by the user – eg. dot (‘.’), dash/hyphen (‘-‘), underscore, etc.) immediately followed by another valid character (one of the above), eg. “2.5” and “F.B.I.”. See "Indexing options" for more information. |
Any characters not in this range will force the current word to end and a new word to start. For example, based on the default configuration, this sentence,
“Record number 653-45+ABCD is invalid”
will be broken up into 6 words,
“Record”, “number”, “653-45”, “ABCD”, “is”, “invalid”
|