PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Excluding directory index from search results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Excluding directory index from search results

    Hi,

    Here's my situation. I have content library located at www.mysite.com/library/ with a number of subdirectories - /links, /documents, etc..

    When I index the site, I enter www.mysite.com/library/ as the web site URL to be indexed.

    In /documents/ subdirectory, I get not only the PDFs themselves being indexed, but also an index of files in that subdirectory.

    So, if I search for say "game", I get the links to documents containing that word as well as a link to "www.mysite.com/library/documents/" directory since it contains a document named "game.pdf".

    I would like to exclude the directory index (www.mysite.com/library/documents/) from search results. I know one way to do this is to disable directory browsing permissions on the server, but is there a way to do it through script settings?

    Thanks,

    -Alex

  • #2
    Several ways to do this:

    - If there are actual web pages (eg. http://www.mysite.com/library_index.html) which contain actual links to all your documents, you would normally use them as the start point, so that your server generated directory listing is not used at all for finding the links to the documents (provided you don't actually link to the directory listing on your website).

    - If you have a local copy of the library documents, you could index in Offline Mode instead (which suits what you are trying to do more - indexing all files under a certain folder, as opposed to following webpage links in spider mode).

    - If you can modify the format/layout of the server generated directory listings, and you can place <!--ZOOMSTOP--> tags in the header and <!--ZOOMRESTART--> tags in the footer, this would prevent Zoom from indexing the content of the directory listings.

    - You can specify the spidering option for your start point(s). If you click on "More" in the Spider mode tab, and then "Edit" the selected start point, you can change it from "Index page and follow internal links" to "Follow links only". This prevents the URL of your start point from being indexed. It will only follow the links from it.

    However, in your case of using "www.mysite.com/library/" as your start point, this would mean that it follows the links to "/links/" and "/documents/" and then proceed to index and follow those directory listings! So if you wanted to do it this way, you would have to specify each of the subdirectories (containing the actual documents, and no further subdirectories) as an individual start point with the "Follow links only" option.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X