Zoom Search Engine FAQ - File formats

Q. Does Zoom index ALL the words inside the PDF and DOC documents?

Yes. Zoom converts these files to plain text and indexes all words found in the entire PDF or DOC document. Images, diagrams, graphs, etc. will however, not be indexed.

Q. How do I specify my own titles and descriptions for PDF and DOC files?

As many external binary documents do not contain useful title and description information, Zoom allows you to specify custom meta information for any plug-in supported files.

This option can be enabled for each individual file type. On the "Scan Options" tab of the Configuration window, double click on a supported file extension and check the option to "Use description (.desc) files". Once it is enabled, the indexer will attempt to look for files ending with the ".desc" extension for this file type.

For example, if you have a file called "mydocument.doc", you can create a text file called "mydocument.doc.desc" in the same directory with the following contents:

<title>This is my document custom title</title>
<meta name="description" content="This is my document's custom description">

Zoom will then index the words found within "mydocument.doc", but use the title and description information found in "mydocument.doc.desc" - so that you will see your custom title and description in your search results.

For Spider Mode - you will have to upload these .desc files to your web site, alongside the files you are indexing. If you are having trouble with the Indexer finding the .desc files on your webserver (and you are sure you have uploaded them), read the following question.

Q. Why are my .desc files not being found by the Indexer?

If the Zoom Indexer is unable to find your .desc files in Spider Mode, check with your host that your webserver allow for files with the ".desc" extension to be hosted. Although, they are simply text files, some web servers have extra security restrictions placed which refuses access to any files with an unknown extension.

On Windows web servers running IIS, this setting can be found in the IIS Control Panel, under website properties. Follow these instructions:

  1. Select the site to configure in IIS, right click and select "Properties"
  2. Under HTTP Headers Tab, select "File Types" under the MIME Map section and select "New Type"
  3. Type ".desc" as the associated extension and "text/html" as the content type, and select "OK".

More details are available on the Microsoft TechNet site here.

Q. Why are some of my PDF files failing to index with a "PDF plugin error"?

There are some limitations with indexing PDF files. If you find that the plugin is failing to index some of your PDF files, it may be because of one of the following:

  • The file is not a valid PDF document. Try opening the file in Acrobat Reader to confirm.
  • The file may have Acrobat Security settings enabled, which prevents content from being extracted or copied. You can confirm this by opening the PDF file in Adobe Acrobat, and clicking on File -> Document Security -> Display Settings. If so, you can either specify the password that was used to encrypt these files (by double clicking the PDF extension on the "Scan Options" tab of the Configuration window) to allow Zoom to index them in their decrypted form, or you will have to remove the protection on these files via the "Document Security" window in Adobe Acrobat. This setting can also be found in the Security tab of Adobe Distiller Preferences.
  • The file may not contain any textual content. For example, it may have been created by scanning a physical document, which would only store the document as an image. For more information, see the following FAQ.

Q. Why can't I find words from my scanned PDF files? (PDFs created from scanning in physical documents)

When you scan a physical (paper) document in with a scanner, the page is captured as an image. PDF files created this way contain images rather than actual text. Effectively, this is similar to taking a photo of your document as opposed to typing it up. If you try opening your PDF file in Adobe Acrobat Reader, and clicking on the Text Selection tool, you will notice that you can not select or copy the text out because of the same reason. However, if you create PDF files from Word, or use OCR software to create your PDF file, it would be stored as proper text, and Zoom would be able to index this without problem.

Adobe provides the Paper Capture online service to convert PDF image files to searchable PDF documents. There is also a Paper Capture Plug-In which you can install for Adobe Acrobat to do the same thing. The more advanced Acrobat Capture software allows you to convert large volumes of PDF files at once.

Q. Why are some of my DOC files failing to index?

Q. I get the error message "Error processing DOC file or unable to write to Zoom folder"

Check that the DOC file is a valid Word document. Note that if the file simply loads up in Microsoft Word is not indication enough that it is actually a Word document. You must also then click on "File" -> "Properties" -> "General", and look for the document type listed. It must say "Microsoft Word Document".

A common problem is that some users may have RTF files which have been mistakenly renamed to a ".doc" file extension at some point. While Microsoft Office appears to load it succesfully regardless of the filename, it actually automatically detects the format internally and opens it as a RTF file, without telling the user. If you open the file up in Word, and follow the above instructions and see "Rich Text Format Document" listed as the document type instead of "Microsoft Word Document", then this is the case.

We would recommend renaming these files back to their rightful ".rtf" extension. You can then enable the RTF extension to index these files appropriately.

Alternatively, you can save these files in a proper DOC file format, by loading them up in Word, and selecting "File" -> "Save as" -> and under "Save as type:", select "Word document (*.doc)". They will then be indexed successfully.

Q. Why are some of my XLS files failing to index?

There are some limitations with indexing XLS files. If you find that Zoom is failing to index some of your XLS files, it may be because of one of the following:

  • The file is not a valid XLS document. Try opening the file in Microsoft Excel to confirm.
  • The XLS file may be created in an old obsolete Excel file format that is not supported (eg. prior to Excel 95). You can check your file by opening it in Excel, clicking on "File" -> "Save as" and looking at the file type selected in the "Save file type as:" drop-down box. If this is the case, you can convert the file to a newer format by selecting a different file type and clicking "Save". You would then be able to index the XLS file successfully.
  • The XLS file may contain password protected worksheets or workbooks. The XLS plugin currently does not support any XLS files with password protection.

Q. "PPT file created with PowerPoint 2007 is not currently supported"

You are seeing this error because this PPT file was created by Office 2007.

Note that this is different to a PPTX file created by Office 2007 (it's native format), or a PPT file created by a version of PowerPoint prior to 2007. These two file formats are supported.

The problem is that the "97-2003 compatible" PPT files created by Office 2007 are NOT in the same format as older PPT files and also not in the PPTX format. They are internally a hybrid of the PPT and PPTX format. Although they may work in older versions of PowerPoint, they do not work in many third party tools (and even some of Microsoft's). This format is not currently supported because Microsoft is unable to provide documentation for it.

So the currently available solutions to index these files is to either:
1) Resave these files as native PPTX files using Office 2007
2) Resave these files as native PPT files using an older Office version.

Q. What can Zoom index from my AutoCAD DWF files?

Yes, Zoom can extract and index all meta information within a DWF file, in addition to all the properties, layers, model attributes such as part numbers, description, comments, mass/weight, and anything that is specified as a property.

However, vector-based text created within the Canvas Pane can not be searched. This is because the "text" here is not actually constructed as text data - they are stored within the DWF files as vector shapes. This is also why AutoCAD itself does not offer a "Search Text" function for this type of content as far as we know. It would essentially require OCR (Optical Character Recognition) processing to identify the text and store this seperately. Unfortunately, there are some DWF files out there which are largely made up of vector based content, and lacking in actual text property/content. These DWF files are difficult to search in, and are akin to a PDF file containing nothing but a scanned image of a paper document. In such cases, you can create custom .DESC files to add meta description and keywords to specify the additional information necessary to make your DWF files more searchable.

Q. My SWF files are created with Flash 8 (or later) and they cannot be indexed with swf2html. What can I do?

From Zoom V6 onwards, SWF files created in Flash 8, Flash 9 or later are supported.

SWF files created in Flash 7 or earlier, are no longer supported. This was previously supported by the swf2html plugin created by Macromedia which is no longer available.

Q. I am using Vista, and when Zoom indexes a non-HTML file, a security warning appears: "The publisher could not be verified. Are you sure you want to run this software?"

Windows Vista expects all executables downloaded from the Internet to be signed so that the publisher can be verified. Unfortunately, since some of our plugins are developed by third parties, and they do not sign their executables, we are unable to sign it for them to avoid this security warning. Not to mention that Vista will still prompt you when it runs an application which is signed and verified (with a different security message). As such, this security warning is normal, and you should clear the checkbox labelled, "Always ask before opening this file" so that you are not prompted again.

All software downloaded from www.wrensoft.com is guaranteed to be free of spyware, viruses, malware, or adware.

Return to the Zoom Search Engine Support page