PDA

View Full Version : Indexing files i dont want it to



Beard
11-23-2007, 08:58 AM
Hi,
On our company Intranet, for each directory on it there are 3 copys of each HTML page, the page was originally done in word (dont ask we've told them not to do webpages in word) then the word document is converted to HTML and another copy is made in PDF format. How can i stop the ZoomSearch software from indexing the PDF documents, ive got CRC turned on, and its skipping the word documents as they are identical. Im pretty sure the PDF files arent listed in the webpages anywhere so theres no reason to index it. Cant really do anything to the pdf files as the site is huge.

Im currently using spider mode.

cheers

wrensoft
11-23-2007, 08:19 PM
If you don't want to index PDF files, then remove .PDF from the list of file types to scan (on the scan options tab). But this seems too obvious, so maybe I am missing the point?

Also, using the CRC option will not filter out Word documents that happen to have the same text as HTML documents. Documents need to be byte for byte identical before they are filtered with the CRC option (at least in V5 of the software).