PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing Dynamically Generated PDF Files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing Dynamically Generated PDF Files

    I've recently purchased a copy of Zoom Professional for one of our
    intranet sites, and need to index several PDF files. The problem is
    that all of the files are stored in a database, and retrieved by means
    of an .aspx page, such as the following:

    http://localhost/Synergy/Software/OS...ile.aspx?id=49

    Because the filename is Downloads_GetFile.aspx?id=49, the file is NOT
    being indexed as a PDF file, and it is only treated as an HTML file.
    Is there a way to force Zoom to parse the contents of
    'Downloads_GetFile.aspx' as a pdf file? It's really essential to me to
    be able to properly index these files. I've looked in the documentation
    but didn't see any way to 'force' files to be parsed with one of the
    plugins.

    Kevin Townsend

  • #2
    We will be adding support for handling PDF documents (and other plugin supported formats) to be served via a server-side script (such as a PHP or ASP page) in Version 4.1 of Zoom. It will determine the document type based on the HTTP content-type header, and index the file accordingly. V4.1 should be available in several weeks time.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Re: Indexing Dynamically Generated PDF Files

      I am having the same issue. I dynamically stream our files to the users. We use a custom extension (.dyn) for our webpage that servers the pdf's. I hope v 4.1 will allow me to do this.

      Comment


      • #4
        Solution for dynamic links

        I figured out a mean and dirty solution to my dynamic, streaming file problem.

        Change your website urls to use the correct extension for each download (change.php to .pdf or .doc).

        My links look like this.

        http://172.17.30.224/repositories/do...nloads&index=0

        Get isapi_rewrite from helicon software. Using isapi_rewrite the url looks correct to the browser and Zoom (displays the .pdf, or doc extension). But to the server it looks the way I want with a .dyn (or .asp, or .php) extension.

        Here are the contents of my isapi_rewrites' "httpd.ini" file. (Rewrite rule is all on one line)

        [ISAPI_Rewrite]
        RewriteRule /(.*)\.(.*)\?repositoryName=(.*)&index=(.*) /downloads/ViewDownLoad.dyn\?elementId=$1.xml&repositoryName= $3&index=$4

        Comment


        • #5
          Interesting workaround.

          And yes, V4.1 should allow you to index PDF files served from a script regardless of the file extension (so ".dyn" would work). However, you would have to make sure that your server-side script is setting the HTTP content-type header properly to indicate the file type.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X