PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Misleading Context Results - Display all hits on a page

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Misleading Context Results - Display all hits on a page

    If a search term appears more than once on an HTML page, the search results list shows only the first occurrence and, with context turned on, the context for that result. This can be misleading; e.g. searching for Barber on a site of mine gives a result showing only one Barber even though others are on the same page. The user might conclude that only the one Barber is listed and not look at the page if that one isn't the one they are looking for - even though they would see it (and highlighted) if they actually looked at the page. In another example, I have a page with about 5 instances of "appointment" but only the first instance and its context are shown in a search results list. If they are on different pages, then they all show up. Any help on this?
    Thanks in advance
    whk

  • #2
    I am curious too to know the answer!

    Roger

    Comment


    • #3
      In fact Zoom will often show several pieces of context from a document. (Usually as a result of a multiword search or wildcard searches). But you are correct they are from near the top of the document, if there are multiple hits.

      Displaying a new result per hit on the same document, or even some context text per hit, doesn't make sense as some common words might appear in a single document 1000's of times. You might end up with 1000's of lines of unwanted output.

      You can also increase the size of the context description. You can do this via the "Results Layout" tab of the Configuration window.

      Finally, the real killer, is that searching for all hits in the same document is really inefficient. At the moment we can just jump directly to the first piece of relevant context and print it out, which is very fast. But doing a linear search of all the content of all documents, would slow down the search significantly.

      Comment


      • #4
        I am not sure if by document you mean one page or the whole site if more than one page. What bothers me is that if a term appears twice on the site, both occurrences will be shown in a search if they are on different pages but only one will be shown if they are on the same page. Therefore whether or not the second occurrence is shown as a search result could depend solely on where a page break is. Searching for restaurant in a site where both Tom's Restaurant and Joe's Restaurant are recommended will do disservice to poor Tom if his restaurant is on the same page as Joe's but is below it, whereas if a page break puts them on separate pages Tom will get a separate mention in the search.
        whk

        Comment


        • #5
          There is always a compromise between maintaining simplicity and readability/convenience (so that the end user does not feel overwhelmed with information), and trying to return as much information as possible.

          What Zoom does is similar to most popular, Internet-wide search engines such as Google and Yahoo do. A page which is considered more relevant (because the matched word appeared more often within the page) will have a higher score, and will also be higher up in the list of results. This behaviour is what the majority of end-users will be expecting.

          If you do a search for "restaurant" on Google, you will find similar behaviour to what Zoom is doing. It does not attempt to display more than one occurance of the word "restaurant" in the context description, even when there are multiple occurances found.

          Your suggested scenario with "Tom's restaurant" and "Joe's restaurant" is to be expected, I think, if you have these listed on the same page. Essentially, this is a "List of restaurant" page that Zoom has found. You can say that because you have placed Joe higher up on the page, it will always be a disservice to Tom who's always gonna be further down the page, even if we were to list both in the search results. This is really a limitation of the fact that you have them listed on the page, in a certain order, as opposed to having individual pages for each one. What if there are over 100 restaurants mentioned on your page? We will have to draw the line somewhere, instead of displaying the entire page in the search results, and someone on the list will be excluded.

          At the end of the day, you have to realize that the search results are never, and can never, give the end user all the information they need, without them going to the page. In Zoom, you do actually have the ability to make your context description larger however.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Context Flexibility

            OK - I'll rethink how to handle Tom and Joe's problem from this end. However, I have a new request to do with context. Would it be possible to optionally specify where the context should start with respect to the search term? Example: Context should be X + 30 (meaning the context phrase should start with the search term and contain the next 30 words) or 30 + X (meaning the reverse) or 15 + X +15 (search term should be in the middle, which seems to be roughly the default). The reason: as it is now if I have a list with Joe's Restaurant, his address, phone number and hours of operation, followed by Tom's Restaurant, address, phone number and hours of operation and search for Tom's I get Joe's phone number and hours of operation followed by "Tom's Restaurant" and address and phone number - not good.. It would obviously be better if I could specify the search term be at the beginning of the context phrase. On the other hand, if I were searching a novel it would probably better to have it in the middle. Design flexibility would be nice here.
            whk

            Comment


            • #7
              The best solution in this case would be for you to re-do your site as a true database driven site (e.g. PHP with SQL records for each restaurant). Then you could search by Restaurant name, post code, pricing, type of food, etc..

              That is to say, you are really asking for a structured multi-criteria based searching and want to results to be displayed in a structured fashion. i.e. a list of records that match the search. This is what SQL and PHP are good at.

              Comment


              • #8
                I agree with the above, that the nature of your site really ought to be a database driven site. You really actually have data that need to be stored as separate, individual records, instead of being listed on a single static web page. You are asking Zoom to determine the distinction between a mass of data on one page, where there is no distinction actually specified or given. Zoom is not designed to do this.

                Your suggestion of a user controllable context is a limited solution. First of all, it only applies for webpages such as the one you're currently looking at, where each record is always roughly the same size (within a set number of words). Second, is that it would be rare for this context setting to apply equally well on all your other pages. For example, you may have another page on the same site, which merely list names + phone numbers, and having a context start 30 words in would include many other phone numbers. You would need a different setting for each page indexed.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  I agree that the way I have described the site makes it sound like a database is the best way to go. Actually it isn't. The site is an on-line version of a printed manual which is mainly for newcomers to our community (a retirement center). The manual consists of text paragraphs discussing the neighborhood plus some interspersed lists such as the one about local restaurants. In any event, I have found how to do what I want; simply remove the code phrase $startpos = $startpos - $gobackbytes from search.php. Granted the context phrases from some searches are better without this change, but at least the context phrases resulting from a search for a business start with the name of that business and not with carryover details from the preceding business (if there is one) on that list. People won't be calling Joe's restaurant when they want Tom's. Probably the user shouldn't use the phone number from the context phrase anyway, but they might. With the change in the code the wrong number is no longer displayed in the context phrase so that's another problem solved.
                  whk

                  Comment


                  • #10
                    Needless to say we don't support custom versions of the script.

                    While your code change may fix your problem when searching for Joe's restaurant, it will detrimentally effect other searches. Plus it still doesn't result in all hits being displayed (I am sure you realise this, but I don't want other people editing the script thinking this is some type of solution to display all hits).

                    Comment


                    • #11
                      I know this change has nothing to do with displaying all hits. I am only using this modified code for the one site. I use Zoom for other sites also but have no intention of using anything but unmodified Zoom on those. I still think it is a fine product and would have no hesitation in recommending it to others.
                      whk

                      Comment


                      • #12
                        Cool. Glad to hear you've found a solution that works to your preference.

                        While we cannot provide support for customized/modified scripts, and so we tend not to encourage users to go down this road (especially those who are not particularly experienced with scripting); it is something we intentionally leave possible for more advanced users - as you should have found, our scripts are not obfuscated (like many other script based products out there) and they are purposely kept readable to allow for this flexibility.
                        --Ray
                        Wrensoft Web Software
                        Sydney, Australia
                        Zoom Search Engine

                        Comment

                        Working...
                        X