PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Cannot search scanned PDF Files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cannot search scanned PDF Files

    We have many pdf files that are created from MS Word using Acrobat Standard 9 and other documents that have been scanned with Fujitsu ScanSnap S1500 scanner.

    The MS Word documents can be searched without any problem, however, the scanned documents cannot be searched using Zoom. We have run the Adobe OCR on them and even the Adobe Embedded Index. Adobe will search the documents without any problem. However, they cannot be searched with Zoom. I deleted and re-installed the .pdf add-in with no change.

    Thanks, Norm

  • #2
    Can you post a link to one of these OCR'ed files so that we can have a look at it.

    The normal OCR process will add a text layer to the PDF. Allowing operations like copy and pasting of the text in the file. Zoom should work fine in this case.

    Comment


    • #3
      Non-indexing PDF file link

      Thanks for your help. Here is the link:

      http://www.heritagehunt.org/Test/horn0501.pdf

      Norm

      Comment


      • #4
        We were able to index that file here. You can search for many words from that document, such as "kudos" or "james". In fact, 17,233 words were indexed from the PDF document.

        However, your OCR process (as all OCR processes are) is limited and some of the "text" in the scanned paper document was not recognized by Adobe OCR. Typically, these are words rendered at a funny angle, or they're displayed within an unusual graphic. To see what words were successfully identified by Adobe OCR, simply open the same PDF file in Adobe Acrobat Reader, and click on "Edit"->"Select All". You will see all the recognized text highlighted, and the unrecognized text (like the names on the front page "John Bisaga", etc.) are ignored. If you do a copy (CTRL-C) and paste to a text editor, you will see this even more clearly.

        I hope that clarifies the issue. If you think there is still text that was OCR'ed but you are unable to search for it with Zoom, please let us know exactly what search terms you are using.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Search Problem

          For some reason these documents are not searched on our website. All of our other pdf documents seem to work. Our site is PW protected. I would like you to try the site. Is it possible to send the login information so that it will not be displayed on the forum?

          Thanks, Norm

          Comment


          • #6
            We did one additional test. We created a document with MS Word, converted it to PDF and put it in the same folder on the server as the scanned documents. After indexing, we could search the new document, but not the scanned documents (one of which you tried above). We are indexing the folder, but those docs don't work. I would like to send you the login info so you can look at them.

            Thanks, Norm

            Comment


            • #7
              Contact details are here. If you do sent something can you also send your Zoom configuration file and the log file from Zoom. (You can save the log from the File menu).

              Comment


              • #8
                Just as a follow-up for anyone else reading this thread: Norm has sent us further details and his Index Log indicated that the PDF files in question were skipped because they were over his "Max. file size limit" (which can be changed under "Configure"->"Limits").
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  File Size Fix Successful

                  I changed the file size and it worked perfectly. Thanks for all your help. Norm

                  Comment

                  Working...
                  X