PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Four Observations on PDF searching in V 5

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Four Observations on PDF searching in V 5

    Hi: here are a few notes on the PDF plugin & version 5:
    1. Words that should be skipped (i.e. "and") are skipped by ZOOM but are apparently passed to the Acrobat search engine. I searched for "
    Meadow and Woods" (a committee name, searched with and w/o the quotes); ZOOM said the "and" was skipped but the Acrobat Search Panel results included a couple of hundred hits for "and" without the other two words.
    2. Search.html takes wildcards, but the wildcard is passed as is to the Acro Reader. I searched for meadow? and ZOOM showed results. Clicking on one brought up the Acro reader which said meadow? was not found.
    3. Is there any way to stop the ZOOM search.html page from displaying the first few words or phrases of the PDF files in the results page?
    4. Good news: I tried using a computer which had Windows ME and Acro Reader 6 on it. I logged on to my site, which has the ZOOM indexed PDF files, and tried searching it. To my surprise It worked, highlighting and all. The only difference was that the Acro Reader search result panel was headed by the search phrase, followed by the results in tree fashion. But it worked, despite the Acro Reader being only V 6.
    Despite my comments 1 - 3 above, I really like the new PDF capabilities. I just deleted the "accepts wildcards" phrase from Search.html and will put in intructions not to include trivial words like "and" in a search phrase.
    WHK

  • #2
    Yes, there are a few minor different things of note with the PDF internal highlighting feature. This is due to the limited capabilities within Adobe's Acrobat Reader, and there is little we can do to change it. However, we will be documenting these little issues in the final release. I think you have most of them in the above, but to confirm...

    Originally posted by whk View Post
    1. Words that should be skipped (i.e. "and") are skipped by ZOOM but are apparently passed to the Acrobat search engine. I searched for "
    Meadow and Woods" (a committee name, searched with and w/o the quotes); ZOOM said the "and" was skipped but the Acrobat Search Panel results included a couple of hundred hits for "and" without the other two words.
    Yes, we pass the skipped words along, and the internal Acrobat Reader will highlight the skipped word if it is found.

    2. Search.html takes wildcards, but the wildcard is passed as is to the Acro Reader. I searched for meadow? and ZOOM showed results. Clicking on one brought up the Acro reader which said meadow? was not found.
    Yes, Acrobat Reader does not support wildcards. And we decided it was too computationally expensive to to pass the full list of matched words to the Reader at this point. So unfortunately, it means that wildcard searches would not be highlighted within Acrobat Reader

    3. Is there any way to stop the ZOOM search.html page from displaying the first few words or phrases of the PDF files in the results page?
    I presume you are using the Javascript version of the Zoom search script since you mentioned "search.html". The first few words or phrases is extracted if you have enabled "Meta description" to be displayed on the "Results Layout" tab of the Configuration window, but there is no meta description to be found for the document.

    To prevent this, you can either disable meta descriptions for all of your search results - OR - specify meta descriptions for your PDF documents.

    You can do the latter by :
    a) Enabling "Retrieve internal meta information" (double click on the ".pdf" extension on the "Scan Options" tab of the Configuration window), this will use the Description entered in the Document Summary of the PDF file.
    or,
    b) Create custom description files for your PDF files and enabling "Use description (.desc) files". See this FAQ:
    http://www.wrensoft.com/zoom/support...html#descfiles

    4. Good news: I tried using a computer which had Windows ME and Acro Reader 6 on it. I logged on to my site, which has the ZOOM indexed PDF files, and tried searching it. To my surprise It worked, highlighting and all. The only difference was that the Acro Reader search result panel was headed by the search phrase, followed by the results in tree fashion. But it worked, despite the Acro Reader being only V 6.
    That is good to hear. We've only previously tested with V7 and have only seen Adobe documents that reference V7 regarding this capability. But I think you're right, and the feature may well work for V6 as well.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Update on Zoom & Acrobat Reader 6

      Originally posted by Ray View Post
      That is good to hear. We've only previously tested with V7 and have only seen Adobe documents that reference V7 regarding this capability. But I think you're right, and the feature may well work for V6 as well.
      I tried using Zoom 5 with Acrobat reader 6 again and got very inconsistent results. Sometimes it worked fine, other times (with the same search word!!) clicking on a Zoom result brought up the correct pdf file but it did not go to the appropriate page nor highlight the word until I clicked on the result in the Adobe search result panel. Other times with the same search, same data base, same site, Zoom went right to the correct page and did automatically highlight the found word. I have no explanation for this unless the browser cache was somehow involved. When it did not work the search parameters were displayed in the Acrobat search result panel. This was with Windows ME. When I previously tried Zoom with reader 6 I only did one search; it worked fine, maybe I was just lucky.
      Walter K.

      Comment


      • #4
        Highlighting within Acrobat Reader is entirely upto the Adobe Acrobat application itself. There are some known inconsistencies with how Acrobat searches for text within PDF files, largely due to the nature of the PDF file format - which does not necessarily separate text from layout.

        I should point out that these inconsistencies are not related to the way Zoom itself searches for words inside PDF documents. The above is only in relation to the "Highlight and locate matched words within PDF document viewer" feature.

        Also, due to the findings above, we will maintain our documentation and requirements that this feature is available for Acrobat Reader 7.0 or later only. It would seem like Acrobat Reader 6.0 had this partially implemented but not consistently.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X