PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Wildcards with PDF files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wildcards with PDF files

    I'm using ZoomSearch V5.1 (Build 1010) in order to index the contents of my Help Application developed with RoboHelp V7. Also, I copy the files, generated by ZoomSearch, within my RoboHelp application thus allowing to embed ZoomSearch as the Search Tool within my application. When I request a search, within my application, by using the wildcards '*' or *?*, ZoomSearch retreives the right information. The problem is that if I click on one of the result which is a PDF file, when this PDF file is displayed on the screen, it says within the PDF Search Panel that if found 0 documents with 0 instances. It seems that PDF doesn't recognize these wildcards. Can you tell me how this fix this problem.
    Thank you for your help.
    Bob.

  • #2
    This question is in reference to the "Highlight and locate within PDF documents (PDF only)" option.

    This option simply passes the search query along to Acrobat Reader, and tells it to highlight the word(s) when it loads up the document. However, Acrobat Reader does not support wildcard searching, and many other features that we do (such as exact phrase and accent insensitivity). As such, there is little we can do about it, except hope that Adobe will improve this feature in their product in the future.

    From the Users Guide:
    Highlight and locate within PDF documents (PDF only)
    This feature will allow searched words to appear highlighted within Acrobat Reader when you click on a PDF document in the list of search results. Acrobat Reader will also scroll and locate the first occurrence of the word. This feature will only work for Acrobat Reader 7.0 or later.

    Note that this feature is dependent on the capabilities of the Acrobat Reader application which does not currently support exact phrase matching and substring matching. This means that it may highlight some words which were not the ones specifically found or matched by Zoom in such cases.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Thank you Ray for your response. It means that I will have to live with this lack of functionnaly until Adobe Acrobat Reader allows PDF files to recognize wildcards '*' and '?' in its search capability.

      Comment


      • #4
        Hi Ray,

        I have created a PHP site using version 5.1 Build 1017 Professional Edition so our users can search our API documentation that is currently in PDF form.

        A standard search for our users involves a wildcard - e.g. "Get*".

        When a user submits a search the correct results are returned, but when they click on the link to the file the PDF search engine reports "0 documents with 0 instances" (as other users have reported).

        Originally posted by Ray View Post
        This option simply passes the search query along to Acrobat Reader, and tells it to highlight the word(s) when it loads up the document. However, Acrobat Reader does not support wildcard searching,...
        From our testing, it appears that Adobe Reader actually defaults to wildcard searching. If we submit a search for "Get" within Adobe Reader itself it returns all matches that contain "Get". The asterix is not required within Acrobat Reader.

        The problem seems to arise from the URL that is generated from the search results - e.g.
        Code:
        http://domain/filepath/filename.pdf#search="Get*"
        If we could strip out the asterix from the link when the filetype is .pdf we would be able to open the PDF and have the search completed as our users are expecting.

        Is there a way within your PHP files to strip out the asterix from the link to the file?

        Cheers,

        Malcolm.

        Comment


        • #5
          That's interesting. I don't think Acrobat Reader behaved this way previously - it might be a recent change. We'll look into it.

          Update: I just tried this with Acrobat Reader 8 and it did not appear to be the case. It was not doing wildcard/substring matching from the #search= parameter. Which version are you using?
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Hi Ray,

            Originally posted by Ray View Post
            I just tried this with Acrobat Reader 8 and it did not appear to be the case. It was not doing wildcard/substring matching from the #search= parameter. Which version are you using?
            That's correct, the wildcard matching does not happen when you click on the URL with the #search= parameter because the asterix is included in the URL, but if you click "New Search" within the reader and do a search without the asterix, Adobe Reader returns all matches and partials.

            I am using Adobe Reader 8, but I have also tested this behaviour on version 9.

            Cheers,

            Malcolm.

            Comment


            • #7
              No, I wasn't talking about clicking on a link from Zoom's results.

              I was testing this simply by typing in a URL similar to the following in my browser (note, no asterix):
              http://www.mysite.com/mydocument.pdf#search="keywo"

              And it did not find and match instances of "keyword" (for example). It was not doing wildcard/partial matching as you are saying.

              If you manually click on the "New Search" button and type in a keyword, it does do partial matching then (and has a separate checkbox option for "Match complete word").

              However, this option is NOT available from the #search= URL parameter, which means that there is no way we can call upon it from a link in the search results.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Hi Ray,

                Originally posted by Ray View Post
                I was testing this simply by typing in a URL similar to the following in my browser (note, no asterix):
                http://www.mysite.com/mydocument.pdf#search="keywo"

                And it did not find and match instances of "keyword" (for example).
                Thanks for the explanation. I assumed the functions available to the "New Search" button within Acrobat would also be available to the URL method. I've done some more tests and confirmed the behaviour you detailed.

                Cheers,

                Malcolm.

                Comment

                Working...
                X