PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PDF, Doc, and Docx not indexing content

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PDF, Doc, and Docx not indexing content

    I have tried everything and cannot get these documents to index their text content.

    When I try a tiny PDF (made from a Word doc, so it's text) with the content of "Credenza rose paraphernalia", this is what the search results look like for that document:

    8. No title
    %PDF-1.6 % 7 0 obj endobj 13 0 obj /Filter/FlateDecode/ID[ ]/Index[7 11]/Info 6 0 R/Length 51/Prev 89257/Root 8 0 R/Size 18/Type/XRef/W[1 2 1] str...
    Terms matched: 1 - Score: 10 - 22 Aug 2014 - 87k - URL: http://www.lgbthealthlink.org/test.pdf

    While I can see the search results by searching for "test", I cannot see any search results for "credenza" etc.

    I am still just testing on my local machine though I am using a live site to index.

    Can you give me any direction?

  • #2
    It looks like the PDF file is being indexed as a plain text file.

    Under "Configure"->"Scan options", make sure you have ".pdf" specified with the "Acrobat document" file type. If not, remove it, and re-add it with the correct file type.

    If you are still having trouble, let us know if you are indexing with Spider Mode or not. And which version and build of the indexer you are using.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      I am having the same issue.

      I do have ".pdf" specified with the "Acrobat document" file type. I am indexing with Spider Mode. I am using Version 7.0 Build 1008 for Mac.

      Steve

      Comment


      • #4
        In your index log, what message do you see for that PDF file being indexed?

        It should be a purple line, and a green line that says:

        Processing PDF file http://www.mysite.com/myfile.pdf
        Indexing http://www.mysite.com/myfile.pdf

        If you are missing one of the above, then it is not recognizing as a PDF file.

        Since you are indexing from spider mode, this can also be due to the way your web server is serving the file. If your web server specifies a HTTP header that says the PDF file is a TXT or HTML, then Zoom will treat it as such regardless of the file extension.

        Can you give us the URL in question? You can also email us your .zcfg configuration file and we can take a closer look.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X