PDA

View Full Version : Indexing Dynamically Generated PDF Files



Anonymous
03-29-2005, 01:00 PM
I've recently purchased a copy of Zoom Professional for one of our
intranet sites, and need to index several PDF files. The problem is
that all of the files are stored in a database, and retrieved by means
of an .aspx page, such as the following:

http://localhost/Synergy/Software/OS+Network/OS+Articles/Guides/Downloads_GetFile.aspx?id=49

Because the filename is Downloads_GetFile.aspx?id=49, the file is NOT
being indexed as a PDF file, and it is only treated as an HTML file.
Is there a way to force Zoom to parse the contents of
'Downloads_GetFile.aspx' as a pdf file? It's really essential to me to
be able to properly index these files. I've looked in the documentation
but didn't see any way to 'force' files to be parsed with one of the
plugins.

Kevin Townsend

Ray
03-30-2005, 12:24 AM
We will be adding support for handling PDF documents (and other plugin supported formats) to be served via a server-side script (such as a PHP or ASP page) in Version 4.1 of Zoom. It will determine the document type based on the HTTP content-type header, and index the file accordingly. V4.1 should be available in several weeks time.

broman
06-13-2005, 03:12 PM
I am having the same issue. I dynamically stream our files to the users. We use a custom extension (.dyn) for our webpage that servers the pdf's. I hope v 4.1 will allow me to do this.

broman
06-13-2005, 08:29 PM
I figured out a mean and dirty solution to my dynamic, streaming file problem.

Change your website urls to use the correct extension for each download (change.php to .pdf or .doc).

My links look like this.

http://172.17.30.224/repositories/downloads/xml/certified_supplies_030716.pdf?repositoryName=downl oads&index=0

Get isapi_rewrite from helicon software. Using isapi_rewrite the url looks correct to the browser and Zoom (displays the .pdf, or doc extension). But to the server it looks the way I want with a .dyn (or .asp, or .php) extension.

Here are the contents of my isapi_rewrites' "httpd.ini" file. (Rewrite rule is all on one line)

[ISAPI_Rewrite]
RewriteRule /(.*)\.(.*)\?repositoryName=(.*)&index=(.*) /downloads/ViewDownLoad.dyn\?elementId=$1.xml&repositoryName=$3&index=$4

Ray
06-14-2005, 03:59 AM
Interesting workaround.

And yes, V4.1 should allow you to index PDF files served from a script regardless of the file extension (so ".dyn" would work). However, you would have to make sure that your server-side script is setting the HTTP content-type header properly to indicate the file type.