PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PDF Search issue

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PDF Search issue

    I have a pdf that is created from a word document just as an html file is. The html searches fine but the pdf will not search. The pdf does not have access restrictions. The indexing seemed to go fine and I uploaded the new data and settings files. I did not upload a new search.php. I put an extra output in the search.php and it shows that only 1 file, the html is a match for the term "Sant". Again the html and the pdf have the exact same content. The pdf file comes up fine. I have tried scanning presentation and text layers. I have added pdf to the extension list. The file limit is 30meg which is adequate.

    The pdf file is http://96.0.183.210/Media/PDF/BiographyOfHuzurMaharaj.pdf

    the Search starts at: http://96.0.183.210/Media/SearchIndex.html

    Any suggestions? - Thanks

  • #2
    We had a look at that PDF file and confirmed that the "pdftotext" plugin is unable to extract content from it. And on the surface, we do not see any of the more common reasons why the text cannot be extracted (there is a text layer, the extraction option is not disallowed, etc.)

    But we also noticed that the file was generated with a third-party package known as "Nitro PDF".

    The "pdftotext" application (and the XPDF package it is a part of) is one of the longest known PDF text extractors, developed by the Xpdf group and the open source community and is used in many PDF supporting applications.

    We suspect the problem is that "Nitro PDF" generated the file in a format which is not 100% compliant or identical to how Adobe Acrobat creates PDF files. We have not seen this problem with any Adobe Acrobat generated PDF file.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      OK

      Thanks I'll check with Nitro - any low cost 3rd party word to pdf converters you can suggest?

      Comment


      • #4
        The latest version of Word includes a free Save as PDF function. See,
        http://www.microsoft.com/downloads/d...displaylang=en

        Also try the free "CutePDF".

        Comment

        Working...
        X