PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing Must List

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing Must List

    A great addition to the Skip List for indexing is a Must List. That is, filenames and paths that do not contain words in the Must List are skipped.

    It makes configuring indexes based on one or two directories in a large, broad website quite simple.

    Perhaps allow something like

    +word

    in the Skip List for words that must appear.

    p.
    :: ::

  • #2
    Yes I can see how this might be useful for some sites. But I think it would also cause a lot of support grief and no one else has asked for this yet.

    For example indexing of the home page would fail becuase people forget to enter, index.html into the Must list. Then we get (even more) support E-Mail along the lines of, "your indexer won't index my site..."

    Comment


    • #3
      <grin>

      Aww, c'mon. It's a prominent, useful feature of at least two other search engines I've used. The idea is primarily intended to support directory selection, (the same idea as limiting the spider to the base domain) and even if was limited to that context alone, would be great to have.

      If there are no words in the must list, then nothing is excluded.

      If, say, you put in a directory name, then any file in that directory would be included, and any file not in the directory would not. Seems pretty simple.

      p.
      :: ::

      Comment


      • #4
        We'll certainly keep it in mind and have it in our list of things to consider for a future version. But as with all feature requests, we must determine its priority by the number of user requests. So at this point, it will be a low priority request until further interest is gathered. We do appreciate the suggestion however, and agree that it can be useful.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Seems pretty simple
          I think you are over estimating the number of people who read the users guide before E-Mailing us

          If, say, you put in a directory name, then any file in that directory would be included
          You can already do exactly this via the start point list in Zoom.

          Comment


          • #6
            Originally posted by wrensoft View Post
            I think you are over estimating the number of people who read the users guide before E-Mailing us
            <grin> Trust me. I'm not. Just today I received eMail addressed to Noam Chomsky marked "personal and confidential," requesting that not even his staff read it. It was sent via a mail form on one of my websites which clearly states the recipient is me and not anyone else.

            Q: What's the cheapest upgrade for any piece of software?
            A: RTFM.

            You can already do exactly this via the start point list in Zoom.
            Generally, yes. To be honest, I'd overlooked the implicit equivalence of a must-list and zoom's start-point list, because it doesn't quite work that way for me. The script that generates one of my sites doesn't think

            http://www.mydomain.com/bibs
            and
            http://www.mydomain.com/bibs/

            are equivalent. It hiccups on the latter and zoom won't index the script output. Moreover, unless I list the full path with trailing slash, zoom will also follow links to pages in

            http://www.mydomain.com/
            http://www.mydomain.com/mugs

            etc.


            Still, I understand why you and Ray would feel this request is a low priority.

            Cheers,

            p.
            :: ::

            Comment


            • #7
              Originally posted by Hex Angel View Post
              Q: What's the cheapest upgrade for any piece of software?
              A: RTFM.


              Originally posted by Hex Angel View Post
              Generally, yes. To be honest, I'd overlooked the implicit equivalence of a must-list and zoom's start-point list, because it doesn't quite work that way for me. The script that generates one of my sites doesn't think

              http://www.mydomain.com/bibs
              and
              http://www.mydomain.com/bibs/

              are equivalent. It hiccups on the latter and zoom won't index the script output. Moreover, unless I list the full path with trailing slash, zoom will also follow links to pages in

              http://www.mydomain.com/
              http://www.mydomain.com/mugs
              etc.
              Although I don't quite have the full picture; from your description, it doesn't sound like it should be a problem. If you want to prevent Zoom from indexing URLs such as:
              http://www.mydomain.com/
              http://www.mydomain.com/mugs
              when indexing from a start point of http://www.mydomain.com/bibs

              ... then you should change the base URL for that start point. By default, it will presume the base URL should be http://www.mydomain.com which is why the two URLs listed before would be indexed. If you do not want it to index anything under say, http://www.mydomain.com/test/ then you can simply specify that as your base URL. If you do not want the spider to follow ANY links at all from that start point, then you can change the spider option to "Index single page only". Take a look at the Users Guide chapter on "Start spider URL" for more information.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Hi Ray,

                Thanks for the reply.

                I can't reproduce the error I was getting previously, so I can't really even explain to myself what was going wrong.

                And thanks also to you guys for fixing the Windows Media Center crash -- without a platform to test it on. Nice work!

                p.
                :: ::

                Comment

                Working...
                X