PDA

View Full Version : Arabic PDF Document Search


ashik
08-22-2007, 05:13 PM
Dears,
This is regarding arabic doucment search. I have a few PDf documents in arabic. I have configured in Languages Windows 1256(Arabic) and in the Search page language i have selected Arabic.zlang.

Basically this search is for a CD ROM and I have indexed these documents using javascript. I find only the results of some of the keywords. Whenever i type any common word (Ex: in English if i type "product" i will get results of 10 pages which has the word product) SO in my arabic search i am not able to get the results of the common words used. Besides that i found out one thing that is the search engine recognizes easily the Starting of the title of the page in the doucment and it doesnt recognize the the common keyword which is present somewhere in the middle of the document.

All i want from my arabic search word is that the common key word just like product should give more results and i am not getting this type of result in the search engine.

I would really appreciate if you could help me.

Thanking you,
Regards,
Ashik

Ray
08-23-2007, 02:20 AM
We would need to see the PDF file in question, and examples of the arabic search words you are looking for. Can you e-mail us (http://www.wrensoft.com/contactus.html) with more details and the files attached?

ashik
08-23-2007, 11:11 AM
Dears,

I have sent you the email via sample.

Now i have another problem. That is the title of the result page shows Microsoft Word - sa1

I just want the title of the result page all in english. I have attached all the files including the zoom file and also keywords file in arabic. Please let me know how do i get the titles of the result page in arabic.

My email is ashik@sitesaudia.com
Please email me back if there is any problem with my indexing.

Standing By

Thanking you,
Best Regards,
Ashik

ashik
08-23-2007, 11:13 AM
Ignore the previous reply,

I want the title of the results page all in arabic please and not in english. Please check the email which i have sent to info@wrensoft.com

Ray
08-24-2007, 05:16 AM
We had a look at the files you sent us, but it wasn't completely clear what keywords you had trouble searching with. You sent us a keywords.txt file but it contained what looked like complete sentences or the entire text data of the PDF files. It would help more if you give us specific search phrases as examples. We are not familiar with the Arabic language so it is difficult for us to work out what a common word looks like.

However, we did determine that you are likely to get more search results by enabling the "Substring match for all searches" option on the "Languages" tab of the Configuration window. This is designed for languages where spaces do not necessarily split up words.

With this option enabled, we were able to get matches for all test searches using the words in your keywords.txt file.

Now i have another problem. That is the title of the result page shows Microsoft Word - sa1

These are the actual titles created in your PDF files (you can see this by opening the PDF file in Acrobat Reader and clicking on "File"->"Properties"). This is a common mistake when people create PDF files by Printing to PDFWriter from Microsoft Word. You can either fix your PDF files or disable meta information from being retrieved from PDF files in Zoom. Or use .desc files to override them.

See this thread for more information:
http://www.wrensoft.com/forum/showthread.php?t=1336

I want the title of the results page all in arabic please and not in english.

You can select the "Arabic.zlang" language file on the "Languages" tab of the Configuration window to translate all text inside the script. You can modify the text yourself as well.

If you are referring to the title of the search results page itself, then this is simply the title in your search template (search.html) file which you can modify accordingly.