PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing a Drupal CMS Website

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing a Drupal CMS Website

    Although I purchased Zoom v6.0 a while ago, I am only now getting around to using it in anger on a website. I can get it to work on sites hosted on Apache servers using sites built with Dreamweaver 8 no problem…

    What I want to be able to do, is to index the following page and all pages below it:

    http://www.leisureandculturedundee.com/library/wighton

    The pages were created and are edited and stored within a Drupal CMS. You'll notice that the starting page doesn't have the extention .htm, .html or .php

    I don't have access to the Drupal server so I want to be able to use the search.php and associated index files on another nom Drupal server and call them from a search form stored within the Drupal site.

    I can get it to work, but whatever I try, Zoom will only index the top-level page and non of the pages below. I have tried adding all the pages using the Configuration option, without success

    Can someone help because as we say in Britain “It’s doing my head in…” (Causing much frustration)?

    Many Thanks in anticipation

    Brian

  • #2
    Under "Configure"->"Scan options", make sure to check the box for "Scan files with no extensions".

    I just had a quick go at indexing that URL (with that option enabled) and I indexed many pages from that URL forward. No need to add the pages manually, etc.

    If you still have trouble, send us the ZCFG file and we can take a look at what might be in your current configuration that is getting in the way.

    Also check the "Log" tab for messages indicating why a file might have been skipped, etc.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Another Question

      Originally posted by Ray View Post
      Under "Configure"->"Scan options", make sure to check the box for "Scan files with no extensions".

      I just had a quick go at indexing that URL (with that option enabled) and I indexed many pages from that URL forward. No need to add the pages manually, etc.

      If you still have trouble, send us the ZCFG file and we can take a look at what might be in your current configuration that is getting in the way.

      Also check the "Log" tab for messages indicating why a file might have been skipped, etc.
      Thanks Ray for getting back so quickly – much appreciated. Your solution worked well and the results are much better than Googles!

      Using the following:
      Starting URL: http://www.leisureandculturedundee.com/library/wighton
      Base URL: http://www.leisureandculturedundee.com/library/

      Zoom appeared to index almost everything below ‘/library’ including ‘/library/wighton’. I was able to search for and find words in all folders below /library such as /library/taybridge which I don't think should happen…

      What I now need to be able to do is limit what is being indexed…

      As an example our Tay Rail Bridge Disaster pages at http://www.leisureandculturedundee.com/library/taybridge has a number of sub-pages beginning with the prefix ‘tay’. An example would be: http://www.leisureandculturedundee.com/library/taybodiesone

      I would like to be able to only index pages beginning with the prefix ‘tay’ or any other prefix of my choice… Kind of like a wildcard index in a sense.

      Hope this all makes sense.

      Cheers in anticipation.

      Brian

      Comment


      • #4
        Originally posted by Brian Hayes View Post
        Using the following:
        Starting URL: http://www.leisureandculturedundee.com/library/wighton
        Base URL: http://www.leisureandculturedundee.com/library/

        Zoom appeared to index almost everything below ‘/library’ including ‘/library/wighton’. I was able to search for and find words in all folders below /library such as /library/taybridge which I don't think should happen…
        Not sure why you think that shouldn't happen. But that's precisely the idea given those two settings... the spider starts at the starting URL, and every link it finds which is "under" the base URL, it will queue up and crawl them in turn. So if there is a link to /library/taybridge, it would be indexed.

        If you want it to do something else, then you should be considering different settings..

        Originally posted by Brian Hayes View Post
        What I now need to be able to do is limit what is being indexed…

        As an example our Tay Rail Bridge Disaster pages at http://www.leisureandculturedundee.com/library/taybridge has a number of sub-pages beginning with the prefix ‘tay’. An example would be: http://www.leisureandculturedundee.com/library/taybodiesone

        I would like to be able to only index pages beginning with the prefix ‘tay’ or any other prefix of my choice… Kind of like a wildcard index in a sense.
        You'd have to go about it the opposite way, e.g. specify patterns for the pages you don't want indexed. In general, and for most websites, this makes more sense. You keep the spider away from certain pages, as opposed to giving it a vague idea of where it can go. Note that if a spider does not visit a page, it may not find a link to crawl to another page that actually would otherwise qualify for indexing. It has no way of knowing what links are on your website until it comes across one on the page it indexed. So it's not very practical to give a wildcard for it to follow and expect it to find all those pages -- it would only work if all pages qualifying that wildcard are linked to each other, and it would be very confusing for most users.

        So under "Configure"->"Skip options", the skip pages list should contain a list of path patterns that you want to exclude from indexing, e.g.
        /forms/
        /calender/
        /doit

        If you really want to specify just the pages you want to index, you can create different start points (by clicking on the "More" button next to the start URL and limiting them to a single page.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X