Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Page score

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Page score

    I am using the Professional edition V7.1 (build 1002) licensed to the University of Sydney.

    Index the following 2 pages as example:

    (1) http://sydney.edu.au/engineering/peo....sukkarieh.php
    (2) http://sydney.edu.au/engineering/people/fabio.ramos.php

    (Index single page only mode for the above)

    Search for "robotic" (or "robotics") or "robotic and intelligent" (or "robotics and intelligent")

    In every case, Fabio Ramos (2) ranks higher than Salah (1). This is despite Salah having a greater number of valid mentions for these search terms.

    "robotics": Fabio 91 mentions, Salah >100.
    "robotics and intelligent": Salah 5 mentions, Fabio 1 mention.

    I have experimented with different config weightings (word weighting, word position, content density), which changes the score, however it does not change the order of results.

    How does Fabio (2) outscore Salah (1) ? I need the reverse!

    Using the "Recommended" is not an option in my use case, and our profile pages use generic templates (meaning I am very limited re individual keyword stuffing / meta data and/or using ZOOMWORDS tag).

    Any advice as to how I can influence the indexer?

    The above "robotics" is merely an example, I actually need to index 200+ academic profiles and job seniority (ie: all things being equal with the same keyword, Professor grades higher than Associate Professor grades higher than Doctor) is a very important consideration for me!

    Kind regards,
    Phil
    Last edited by philhenville; 06-06-2016, 02:06 AM.

  • #2
    Hi Phil,

    We had a closer look at the situations you mentioned.

    I'll first discuss the case we couldn't reproduce, namely the second use case:

    Originally posted by philhenville View Post
    "robotics and intelligent": Salah 5 mentions, Fabio 1 mention.
    When we had all weighting settings at "Normal" and "No adjustment" set for Word position/Content density/URL length (on the Weightings configuration tab), this use case did not rank Fabio over Salah. Salah was preferenced over Fabio regardless of how we searched that query (any/all search words, exact phrase or otherwise).

    I would suspect that this particular situation was due to one of the aforementioned settings, such as "Content density" which prefers smaller documents over larger documents. In which case, Fabio would have a preference due to its brevity.

    Originally posted by philhenville View Post
    "robotics": Fabio 91 mentions, Salah >100.
    This one we can reproduce, and can confirm.

    The reason is that by our algorithm, up to a certain number of occurrences of a word on the same page, we do not consider it to be adding value to the page.

    Certainly in the realm of ~100+ occurrences, this starts to be considered less value, and we allow other factors such as whether the word appears in the first or later portion of the page, and the size of the document, etc. ("word position" and "content density" respectively) to be more important factors in the relevance of the page.

    There are a few factors which cause the word "robotics" to be particularly affected in the scoring on these two pages.

    1) With "Stemming" enabled, "robotics" is considered the same as "robotic" and "robot" and "robots".
    2) You have a ZOOMPAGEBOOST +3 tag on both pages, which exacerbates the issue. You can try using negative/deboost values such as -9 on the Fabio page to help reduce its dominance.

    For the record, by my count on the HTML level, the Fabio page has 178 occurrences of "robotic" and the Salah page has 371 occurrences (excluding "robot" and "robots", filenames in links, etc.). So from a purely key word occurrence point of view, both pages get considered as equally "super important".

    In fact, when we have all weightings set to zero, they are scored equally.

    But then the other weightings and factors kick in, that's when it's possible Fabio gets ranked higher.

    I would advise the following:

    (1) Consider excluding more of the page from being indexed. I see you're already using ZOOMSTOP and ZOOMRESTART. Obviously it's up to you to decide what's relevant. Note that what's visible when you see the page in the browser is very different from the HTML, a fair bit of content is hidden from view within the "More.." link.

    (2) Both of these pages are actually quite large (178 KB and 113 KB) and take a while to serve and download. So there's much benefit in considering if there's a better way to format this information. Salah's page takes about 8 seconds to download here, and I'm on a fibre connection in Sydney here.

    In fact, on closer inspection, a huge chunk of the content appears twice in the HTML due to the way the "Selected publications" are presented via both options to display "By type" or "By year". Given the size of the data in these two tabs, it's worth investigating an alternative to providing this option that does not rely on including the data twice or more on the page.

    I think this is the main reason there's so much data to swamp the results and addressing this would help alot.

    (3) Use ZOOMPAGEBOOST with deboost (negative values) to push down pages which might be swamping results.

    (4) Regarding your further categorization of information (job seniority, etc.) perhaps you should take a look at the Custom Meta Field feature and the Category which would allow you to provide your users with more grouping information and search narrowing options. For example, you can have a custom meta field that specifies the staff member's title, and then the search results would be able to present a list of results in the form of:

    Refine you results by: Professors (3), Associate Professors (6), Doctors (10)

    Or allowing the user to restrict their search within a particular group. Take a look at the example above for more usages.

    Hope that helps.

    p.s. As a USYD B.Eng (Software) graduate myself, it would certainly be nice to see Zoom on the sydney.edu.au website
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Thank you very much Ray for the prompt response. I am in process of implementing your recommendations.

      I've been a small business user of Zoom Search since approx 2007, so it's great to see the solution survive and improve each version.

      Who was your favourite lecturer?

      Comment

      Working...
      X