PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Skipped Word

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Skipped Word

    I noticed that skipped words will only be skipped if they are entered without any other words in the search box. For example: OF will be skipped if entered by itself; However, when input as, University OF (two words) the indexer will find every instance for University and OF.

    I would think it would skip it in all instances except when used in a phrase with quotes such as, "Univerity OF" .

    Am I doing something wrong in my configuration or am I missing something here? I have that word in the skipped word list and also have it set to skip words of less than 2 characters. Ran the indexer several times and uploaded more than once to make sure I uploaded correctly. The indexer is finding each and every instance for OF when used with another word in the search box.

    I am trying to make the search results more concise and less cluttered. Also, I would imagine that it would reduce the file size on the server.

    Thanks.

  • #2
    Are you sure you are actually getting every instance for the word "of", when searching for "university of" (without quotes)?

    Note that the word "of" is very common, so it is likely that Zoom only found the words for "university" but all the results contain the word 'of' as well (by likely coincidence). But note that the scoring etc. are all entirely dependent on it having only matched one word in the index (eg. all results should say "Terms matched: 1").

    However, what might be confusing you is that Zoom would highlight all instances of the word "of" that appears in the search results as well - even though it did not locate the word in the index. This might have led you to think that Zoom looked for both words.

    The real indication should be the "Terms matched" value and the message at the top of the results:

    The following word(s) are in the skip word list and have been omitted from your search: "of"
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      1st Occurence

      Yes, it appeared to be finding the word even though it was in the skipped word list. I don't understand why it would highlight the word though - When you look at the search results, there are so many highlighted words that it makes it hard to pinpoint. Other than turning highlight OFF is there a way to stop this?

      Also, another question off the subject a bit: Is there a way to have a category limited to a particular directory (other than using ZOOMCATEGORY in the META)? Suppose I want the root directory as one category and then several sub-directories as separate categories - Is this possible?

      Another thing I might be able to do, if this is possible, is to give higher weightings to specific file types - Have .html documents listed before .pdf documents in the search results. Can this be done?

      Lastly, and this too is off the subject, did you know that zoom can index 2 separate password protected directories (with the same log-in info) but they must be in a particular order in the SPIDER listing of directories? And I think this order may be alphabetical. In other words, if you want to index /file/A_protected and /file/B_protected, I think they need to be listed in in this order in the SPIDER list. If I put /file/B_protected ahead of /file/A_protected in the spider list it will index /file/B_protected but /file/A_protected will error out with a 401 error. Reversing the order so that they are alphabetical results in a correct search of both directories with no errors. Now, someone may ask the obvious next question of indexing two directories that have 2 separate login passwords?

      Sorry to be longwinded - Trying to get this all worked out.

      Thank you.
      Last edited by WilliamJ; Feb-27-2007, 03:59 PM.

      Comment


      • #4
        Originally posted by WilliamJ View Post
        Yes, it appeared to be finding the word even though it was in the skipped word list. I don't understand why it would highlight the word though - When you look at the search results, there are so many highlighted words that it makes it hard to pinpoint. Other than turning highlight OFF is there a way to stop this?
        It should only be highlighting words that match your search query. If you searched for the word "of", even though Zoom considers this a "skip word" and did not look for the word in the index (and the index does not contain data on this word), it will still highlight the word if it happens to appear in the results.

        No, you can not disable highlighting for words in the skip list exclusively. It would seem to me, that if you did not want to _find_ the word "of", one would not be searching for the word "of"?

        If you find the highlighting colours too obtrusive, perhaps you should simply consider changing the appearance of the highlighting. You can do this via CSS. For example, you could change it to:

        Code:
        .highlight { font-weight: bold; }
        Which would simply make highlighted words bold, as opposed to being yellow in colour (the default).

        Originally posted by WilliamJ View Post
        Also, another question off the subject a bit: Is there a way to have a category limited to a particular directory (other than using ZOOMCATEGORY in the META)? Suppose I want the root directory as one category and then several sub-directories as separate categories - Is this possible?

        Another thing I might be able to do, if this is possible, is to give higher weightings to specific file types - Have .html documents listed before .pdf documents in the search results. Can this be done?
        I'll answer these questions in the other threads where you have posted the same questions:
        http://www.wrensoft.com/forum/showthread.php?t=1574
        http://www.wrensoft.com/forum/showthread.php?t=1576

        Please post questions in one place. It gets confusing and we risk answering the same question twice, and we have to work out which is which afterwards. Remember that you can always make use of the "Edit" button and remove a question from one post if you later decide it deserves a separate thread of its own. This would at least save us time in having to do this for you (as we have had to do so previously). It gets very confusing for our readers (and people searching the forums) when you have the same question posted in multiple places.

        Originally posted by WilliamJ View Post
        Lastly, and this too is off the subject, did you know that zoom can index 2 separate password protected directories (with the same log-in info) but they must be in a particular order in the SPIDER listing of directories? And I think this order may be alphabetical. In other words, if you want to index /file/A_protected and /file/B_protected, I think they need to be listed in in this order in the SPIDER list. If I put /file/B_protected ahead of /file/A_protected in the spider list it will index /file/B_protected but /file/A_protected will error out with a 401 error. Reversing the order so that they are alphabetical results in a correct search of both directories with no errors. Now, someone may ask the obvious next question of indexing two directories that have 2 separate login passwords?
        No, the alphabetical order of the listing of URLs has no significance. I think what you observed would most likely be coincidence and is actually a behaviour caused by an unrelated issue.

        First question would be what type of authentication is your site using. Please see this support page for a detailed explanation:
        http://www.wrensoft.com/zoom/support/auth.html

        My guess is, if you're using session/cookie-based authentication - that your first URL/directory might contain a link to logout, and you have not added this to your skip list. Because of this, when you index the first URL, the spider is logged out, and unable to log back in to access the second URL. That's just purely guessing from what little information I have though. But you should look into it, and I can tell you that it's definitely not due to the alphabetical order of the directories.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Skipped Word

          Thanks for your replies.

          Yes, I forgot to remove the other questions when I reposted them in the other areas. I don't post a lot to forums so I get a little rusty on the logistics.

          Regarding the password issue, all I know is that once I moved it so that it was alphabetical it indexed fine with no errors. The content of the directories is the same format (no logout or anything different in one versus the other). It works fine so unless I have other issues, I'll assume I have it worked out.

          Thanks again.

          Comment

          Working...
          X