PDA

View Full Version : Scan extensions bug / limitation


ocgltd
12-22-2007, 08:43 PM
I sent this to support but though other users might have a solution to....

On my site I limit direct access to certain file types, by using a script which retrieves and sends the file to the browser. For example, instead of letting a user link to:
www.mysite.com/myHIDDENdirectory/myHIDDENfile.pdf

They must link to
www.mysite.com/getfile.php?type=pdf&file=myHIDDENfile

This works great AND zoom search does a great job of retrieving and indexing the PDF file. (Since I send the header "Content-Type: application/pdf" your program properly recognizes the file type retrieved).

However, zoom seems to use the link text (not the "content-type" of the returned header) to determine the type of file and associate an image. As a result, the links to PDF files in search results show up with the HTML icon/image (not the PDF icon/image).

At this point the only solutions I can see are these enhancements:

1. Add a configuration page called "Content types" which works like "scan extensions", but overrides the file/scan extension to determine the document type and associated image?

or

2. I really like the way you handle categories, and wonder if you could do the same for content types? For example, if zoom would allow me to override the file type / scan extension based on what it finds in the link/href string, (eg: "?type=pdf" ) then I could reset the content type myself. However, option #1 seems better.

wrensoft
12-22-2007, 11:55 PM
I haven't tested the scenario with the Icon, but I suspect you are correct. We are probably using the file extension and not the content type for the icon.

In V6 of the software we are doing a much more flexible interface for file types. But that is months away.

Maybe, as a poor work around, you can make the PHP icon look like a PDF icon. As it probalby picking up the PHP icon.

We look more deeply into the issue after Christmas with a view to fixing the bug in V5 of the software early in the new year.