PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Weighting within page ?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighting within page ?

    Hi I'm trying to use Zoom search engine for a website with several "sections" per html page. It is a static site. Everything is working fine. However:

    I would like the "context description" in the search results to show "section" title matches, and context, rather than showing the context for the first match on the page, which might be in a paragraph or be referring to one of these sections.

    i.e. Say I have a section on a page called "Fix my Bike" from a year ago, and I have a more recent section (higher on the page) on the page called " Cycling Home" and in that section it contains the phrase "fix my bike". I would rather the section called "fix my bike" formed the "context description" rather than the "Cycling Home". So that people searching for the more appropriate "Fix my Bike" not only have the most appropriate context displayed below the search result, but, can also use the "Jump to content" feature to get there.

    To try and make this happen I have tried making sure "sections" on my page use html "headings" for the section title, and increased weighting for headings in the zoom options, as well as use "no adjustment" for "word position" but I think that is only increasing the overall weight of the "page" result. And the first available match in the page is still shown for the content.

    Is there any way round this? or even is there a way to show multiple "search results" for a single page in the search results page so that multiple contexts are shown. Surely its not uncommon for there to be several matches per page, and there be a need to differentiate results, especially for jump to match purposes.

    Hope there is a way to do this, and hope this is easy to understand. It could be I'm just missing something obvious...

  • #2
    Strange, I thought I answered this a few days back. Not sure what happened.

    Unfortunately while you may have several sections on the same page that can be identified by a human being like you and I (because of familiar formatting or layout), technically in HTML there is no clear distinction as to what counts as a section within a page. So it is not really possible for a program (aka robot or spider) to determine this (short of making a lot of assumptions about heading sizes, paragraphs, and positioning of text, that would be wrong in many other situations where the layout is not so straight forward).

    Zoom will only display up to 3 context hits by default. You can adjust the "size" of the context description (showing more words before and after the context hit) by changing the "Context size" setting ("Configure"->"Results Layout"). However, this won't be enough for you if you actually have many sections and you need it to skip over some.

    The best way to address this, in the long run as well, for SEO benefits and usability -- is to break these sections up into individual pages.

    Other solutions would be trickier server-side scripting where the page is altered depending on a parameter and wraps the paragraphs/sections it wants excluded within <!--ZOOMSTOP--> and <!--ZOOMRESTART--> tags. From a spider's point of view, these would have to be different URLs and treated as different pages.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Hi Ray, thanks for the reply. I totally understand what your saying, and already have the <!--ZOOMSTOP--> and <!--ZOOMRESTART--> tag as a clunky workaround. Taking out all the paragraph text this way will leave only the headings spiderable. Of course this means all the paragraph text is now not indexed, which is suboptimal (though I do have the option to make a duplicate page without the zoomstop filtering, not give the headings a heading tag for this page, have headings high in the zoom weighting options, hopefully a search for the heading phrase should put filtered page first, with the second result based on paragraph content, but this is too clunky, and the idea of duplicate pages for google to discover is not a good one for futurability)

      I would, however, disagree that every site would benefit from page per "section" division. My site doesn't need higher SEO rankings, and with sometimes 40 sections per page, my 20 page site would be huge and much harder to navigate, its specifically designed this way to aid usability. There are also many needs mine and other sites might have for having a fair amount of content within a single page with the need for it all to be searchable, with not only the highest in page match being the only one put forward

      It seems the importance of search within the page has been overlooked. From what you are saying, and what I've deduced, it seems that if there is more than one result for a search phrase on a page, and if the matches are more than a couple of sentences or so apart then only the first match is shown. Additionally Zoom search has been programmed to make a choice as to the order of results within a single page, displaying the result from higher in the page, but offers no control over this. For instance reversing this (priority given instead to the matches nearer the bottom of the page) would work well for my needs. Certainly zoom search is not alone in this. using google site search parameters, it is the same result. If there is more than one match on a page, only one result is shown for this, and only the highest in page result is shown as context.

      But you are zoom! and better than Google! I understand that there is no way a search engine can determine what I would like it to do without being guided, but it's not so hard to implement, In fact what I had tried to do (have html headings within the page have higher weighting) already help prioritise that page in results, surely it would be a similar procedure to have the result and context prioritised by the same criteria.

      So for instance zoom could:

      1. have the option for zoom results to show more than one result per page, either as a separate result in the list, or by giving additional distinct context description paragraphs for a given page's result, with an option in settings to set how many such contexts be given. (personally I think the latter would be better)

      2. Prioritise results and given context within the page using the same or similar weighting criteria as for the site search. for instance html headings +4 paragraph -1 etc

      3. Perhaps have the option for a user specified tag to give priority (this could be used for site-wide results as well as in page results)

      4. <!--ZOOMSTOP--> and <!--ZOOMRESTART--> are great, but what about including the opposite <!--ZOOMPRIORITISE--> and <!!--ZOOMDEPRIORITISE-->. Again this could be used to boost a phrase or paragraph or section for site-wide results as well as in page results.

      Hopefully those aren't stupid ideas, and maybe they could be considered for V7.

      Thanks again for your reply and at least I know there is little that can be done for now, so I can give up gracefully!

      Ben
      Last edited by conradish; Mar-05-2013, 01:59 PM.

      Comment


      • #4
        We understand the problem, and recognize the need for more control over which context descriptions to show. But note that in all the suggestions above, the link (after a user clicks on a search result) will still not be able to "jump to" the section required, again because there is a lack of section definition.

        Prioritising paragraphs can also be nice but I don't know if it will help your problem, because while one section may be priority for one query, it would be less priority to another query.

        There are many websites which have a "blog-like" listing/view of sections and articles. But note that typically they use a database or a CMS/blog package, so that those sections are actually available as individual pages as well -- but only presented to you in that viewing.

        Having said all that, I've got an idea for you which would mimic that behaviour.

        I don't know how open/possible you are to modifying your content files. If you are still in the process of writing and formatting them, then this should be okay. But as noted before, I don't think there's any easy way to achieve something better without additional formatting/markup.

        For simplicity's sake, I'm first assuming you're using Offline Mode (or can use Offline mode -- and don't have any PHP or ASP pages to index). Let me know if otherwise.

        In the folder containing your 20 large pages, I'll assume you have something like this:

        one.html
        two.html
        three.html
        ... and so forth.

        Now in this same folder, create a subfolder named "one.html_sections"

        This folder is to contain (say) 40 text files -- one for each section in "one.html"

        So you would have a text file named "fixmybike" (no extension, no spaces) where you have copied and pasted the section of text from "one.html", and another file named "cyclinghome" containing text from that section.

        Back to your large "one.html" page, at the start of the "fix my bike" section, you'd need a HTML anchor tag like this:

        <a name="fixmybike" />
        <h1>Fix my bike</h1>
        <p>This is the section about fixing my bike.... etc. etc</p>

        Now in Zoom, make sure you are indexing files with no extensions ("Configure"->"Scan options"->"Scan files with no extensions").

        Then go to "Configure"->"Indexing options" and check the option to "Rewrite all indexed URLs as follows...".

        Next to "Find in URL: " specify the following:
        _sections/

        Next to "Replace with:" specify the following:
        #

        What this will do is when Zoom indexes the following file:
        C:\MySite\pages\one.html_sections\fixmybike

        When normally it would've created a URL as follows in your search results:
        http://www.mysite.com/pages/one.html_sections/fixmybike

        It would now create a URL like this:
        http://www.mysite.com/pages/one.html#fixmybike

        So it will now point to your "one.html" file, and the anchor tag will jump to the section you want.

        I hope that is not too convoluted. But it's an idea anyway.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Many thanks for this Ray, your solution, though it took me a couple of read throughs to get it, is nice. It'll take a while to implement, but would work really well. Only thing is, with several hundred text files needed to index, I'll need to upgrade to pro!

          One thing that did occur to me from this was that maybe zoom could use anchors in weighting and then use those anchors for "jump to" . . . This way section headings (or whatever floats one's boat) could be prioritized and there would be section definition. Just a thought.

          Anyway, many thanks again for your efforts helping with this, and coming up with an ingenious solution.

          Conrad

          Comment


          • #6
            show Multiple matches as separate results?

            Originally posted by conradish View Post
            ... is there a way to show multiple "search results" for a single page in the search results page so that multiple contexts are shown. Surely its not uncommon for there to be several matches per page, and there be a need to differentiate results, especially for jump to match purposes.
            ....
            I tried to find a similar query to my need and this is the only one that appeared to touch on it so I'm adding my 2cents here rather than a separate thread.

            I, too, would like to show separate results for multiple matches in a single document. The option to limit the number of repeat-results would be a plus.

            I have a very small site sing Zoom v6 and have observed that it is possible to get a result that does not display my expected context and thus on the surface it appears that my desired search was unsuccessful. A visitor might presume that there is nothing pertinent in a given result or miss the second or third match when following it. I realize that once the result url is followed the visitor must still find the match location but would like to at least show the context of all (to the specified limit) matches. Another possibility would be to report 'Matches found...' as well as 'Terms matched...' on the info line of the result.

            Comment


            • #7
              Thanks for the suggestion. We're keeping it in mind and I'll update here if we can think of some way to implement it discreetly in the code.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Thanks for your acknowledgement and consideration.

                Comment

                Working...
                X