PDA

View Full Version : Exclude folders setting not working - Zoom Version 5.0



nirbhay
05-01-2007, 05:25 AM
Hello All
We recently purchased ZOOM Enterprise Ver 5.0.
After including certain web folders to be excluded from the general search(Sensitive information), the Zoom Indexer still indexes pages with these excluded folders
We use Spider Mode (ASP). The web is using Dot Net Framework 2.0

The folders are included using the following syntax
"/Admin/" (Without the Quotes)
"/Security/"

I even tried giving the fill path to the folder, still no success.
Read somewhere on the forum that folder names are case sensitive,which seems odd for a web folder.
Anyway tried all , with no success


Please help

Thanks

Ray
05-01-2007, 05:56 AM
Web folder names are most certainly case sensitive. This is actually most common, because the majority of web servers are Unix-based (where file/path names have always been case sensitive). By definition, filenames and paths in URLs are always case sensitive, and only IIS and Windows map the names across to match regardless of case.

Can you give us the message in the Index Log which report the files having been indexed (the ones that you want to skip) and their exact URLs? This will probably give us a better idea of what the URL exactly look like, and whether it needs to be entirely in lowercase or uppercase, or mixed case, etc.

Remember that the skip file list is matched against the entire URL of the file being crawled in Spider Mode (including the slashes). The URLs that would be skipped by your above examples should look something like:
http://www.mysite.com/mypages/Admin/blah.html
http://www.mysite.com/Security/somethingorother.asp
http://www.mysite.com/page.asp?file=/Security/new.html

nirbhay
05-01-2007, 04:02 PM
Hello

Sorry as almost skipped the unix based web portals.
Anway here's the log entry

08:40:23 - [DOWNLOAD] Downloading file http://testserver/myweb/reports/support.aspx (64864 bytes)

08:40:23 - [INDEXED] Indexing http://testserver/myweb/reports/support.aspx

I included /Reports/ as an entry in the skip Option tab.

I would like to skip indexing any aspx pages with in reports folder

Also while i have your attention, i noticed that if i have '&' character in the pdf filename, its gives an error indexing the pdf. Is there a workaround to this ?

Thanks for the help

Ray
05-02-2007, 01:51 AM
As we have just mentioned before, the skip list is case sensitive. This means that upper and lower case differences are significant.

This means that a skip entry of "/Reports/" will NOT skip
http://testserver/myweb/reports/support.aspx

To skip that URL, you will need a skip entry of "/reports/". Add this to your skip list, and try indexing again.

As for your ampersand ('&') problem - this is probably due to the fact that you are not encoding your ampersand characters properly on your web pages. It is illegal HTML and an invalid URL to have a stand-alone ampersand character in a URL. Such characters need to be encoded, either as a HTML entity ("&") or escaped in the URL ("%26").

More information on this here:
http://htmlhelp.com/tools/validator/problems.html#amp

If this is not the problem, then please give us actual URLs of the files in question, and the error message you are seeing.