We've confirmed that this is a bug in the current release [V6.0.1028]. It has been fixed in the V7 Alpha release.
If you wish to apply the fix manually by editing the PHP script, then search for this line in "search.php":
And replace it with this:Code:$query = preg_replace("/[\s\(\)\^\[\]\|\{\}\%\£\!]+|[\-._',:&\/\\\](\s|$)/u", " ", $query);
Note that you will have to be very careful when you're editing the PHP script and we would not advise doing this if you are uncomfortable with PHP scripting.Code:$query = preg_replace("/[\s\(\)\^\[\]\|\{\}\%\!]+|[\-._',:&\/\\\](\s|$)/u", " ", $query);
Note that the "search.php" file in the output folder will be rewritten when you re-index. You can modify the source copy under "C:\ProgramData\Wrensoft\Zoom Search Engine Indexer\scripts\PHP or ASP\" but note that modified scripts are difficult for us to support as functionality may be broken by incorrect modifications. So if you are uncomfortable with editing, then use V7 Alpha.
hi thanks so much mani will try it and i hope everything goes right
thank you
hi guys i have one more question how to let the search engine detect all the following letter when search on of them :
"أ" alif with above hamza
"إ" alif with down hamza
"ا" just alif
"آ" alif with above madda
chm search engine able to find :
( أ , ا, إ )
at the same time of searching is there anyway i can do that with search zoom
thanks in advanced
While I haven't looked at this in detail, I would have assumed these characters would just work like any other character if you are using UTF-8 as the character set.
Is there something special about these characters compared to all other Arabic characters?
i use utf 8 as my language settings for encoding characters and
those characters that i mentioned above post starts in first Arabic words for example :
أمي
even when i enable strip Diacritics still cant find similar words like
امي
إمي
and sometimes when typing in Arabic we dont put hamaza with Alif like this أ or إ
we simply type it like this " ا " without the quots and it would be nice to able to find this characters when searching the words that contain one of this 3 characters like this exmple :
search :
ان
results :
ان + إن +أن
OK I see, you are asking for the 3 types of alif character to be treated as the same character. So when you search for one of them, it matches the other 2 versions of the character. Correct?
Like we do for French accents, é and e for example.
hi i wonder if its possible of highlight words with diacritic for the file highlight.js like this example :
http://jsfiddle.net/FUg85/15/
We can probably add something like that into V7. However, we're not familiar with Arabic lettering so I'm not entirely sure how universal the above suggestion is. Did you write that bit of code yourself, or is it from someone else? Are you aware that it simply strips the following 5 characters:
|ِ
|َ
|ٍ
|ً
|ّ
From the two strings being compared? Is that enough to fix all issues with diacritic marks in Arabic or are there other marks that are not addressed by this approach?
the above script example i found it in the net but it could helps as an example for the java-script of highlight diacritic words in the highlight script file and this is the standard arabic characters :
أ alif with above hamza
ب baa
ت taa
ث close to thaa
ج jaa
ح haa or 7aa
خ khaa
د daa
ذ thaa
ر raa
ز zaa
س saa
ش shaa
ص close to saa
ض close to daa
ط close to taa
ظ close to thaa
ع ayin or close to aaa
غ close to khaa
ف faa
ق close to kaa
ك kaa
ل laa
م maa
ن naa
هـ haa
و waa
ي yaa
and this the diacritics used with it i will put it to ( ـ ) as indicator to the arabic characters :
and you could use notepad to see it better and to understand more how this characters sound you could use arabic text to voice program like this :
( ـُ )
damma
( ـَ )
fattha
( ـِ )
kassra
( ـٌ )
tanween damma or double damma
( ـً )
tanween fattha or double fatha
( ـٍ )
tanween kassra or double kassra
( ـْ )
skoon
( ـّ )
shadda
( ـَّ )
fattha above shadda
( ـُّ )= -ّ + -ُ
damma above shadda
( ـِّ )= -ّ + -ِ
shadda above kassra
( ـَّ )= -ّ + -َ
fattha above shadda
ّ( ـٌّ )= -ّ + -ُ
double damma above shadda
( ـٍّ ) = -ّ + -ٍ
shadda above tanween kassra
إ = ا + ء
stand alone characters
hamza under alif
أ = ا + ء
stand alone character
hamza above alif
آ = ا + ~
stand alone character
madda above alif
لأ = ل + أ
stand alone character
laa with hamza above alif
لإ = ل + إ
stand alone character
laa with hamza under alif
لآ = ل + آ
stand alone character
laa with maddda above alif
( ؤ )= و + ء
stand alone character
hammza above wow
ئ = ى + ء
stand alone character
hamza above short alif
( ى ) short alif stand alone character
( ء ) just hamza consider as stand alone character
https://acapela-box.com/AcaBox/index.php
if guys need more information how to use it with keyboards I'm glad to help you for more details information about it and check wiki site for images and information :
http://en.wikipedia.org/wiki/Arabic_diacritics
Last edited by mrbasserby; 12-03-2012 at 03:23 PM.