PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

facebook iframe - zoom search?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • facebook iframe - zoom search?

    Hi guys

    We are getting duplicates in our zoom search results.
    The urls are almost exactly the same -
    http://www.example.com/test.htm
    and
    http://www.example.com/test.htm/

    When I compare the source code of both pages, they are identical except for
    the url (as above) and the facebook iframe which differs in its link (whether the trailing slash appears.) EG

    <iframe id="facebook_like" src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.example.com/test.htm/" ></iframe>

    So I wrapped the iframe in zoom tags Eg

    <!--ZOOMSTOP-->
    <!--ZOOMSTOPFOLLOW-->
    <iframe id="facebook_like" src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.example.com/test.htm/" ></iframe>
    <!--ZOOMRESTARTFOLLOW-->
    <!--ZOOMRESTART-->

    Is this enough to make sure zoom doesn't open the iframe content?
    Shouldn't zoom disregard the iframe content since it is a different domain (facebook.com)?
    What else can I do?

  • #2
    Several things of note:

    1) Zoom would not be following that facebook.com link unless you:
    (a) have multiple start points, one of which is facebook.com
    (b) have multiple base URLs for the start point, which allows facebook.com to be considered part of the same start point.
    (c) have set the spidering options for the start point to "Index page and follow internal and external links" (after clicking on "More"->"Edit")

    If none of the above is the case, then I'd suspect there's another link somewhere on your site which is going to that URL, rather than the facebook link.

    2) Technically, the following are two very different URLs:
    http://www.example.com/test.htm
    http://www.example.com/test.htm/

    The latter is in fact, a directory named "test.htm". However, you can configure a web server to rewrite URLs and automatically attempt to find a matching filename, ignoring the fact that a folder was actually requested. When this happens, the server is simply compensating and doing this while the client (i.e. the browser or in this case, the spider) is none the wiser and is given no clue that this was treated as the same URL.

    Having said that, what Zoom can do is look at the page content and decide if it is truly a duplicate page, and reject it if so. This setting can be found under "Configure"->"Scan options"->"Use CRC to skip files with identical content".

    Note that the page must be completely identical to work, so if it has something which is dynamic (e.g. the current date and time is printed at the top of the page, or it contains advertising), then it will not be recognized as being identical.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Originally posted by Ray View Post
      Having said that, what Zoom can do is look at the page content and decide if it is truly a duplicate page, and reject it if so. This setting can be found under "Configure"->"Scan options"->"Use CRC to skip files with identical content".

      Note that the page must be completely identical to work, so if it has something which is dynamic (e.g. the current date and time is printed at the top of the page, or it contains advertising), then it will not be recognized as being identical.
      Thanks for the quick response Ray.
      "Use CRC" is selected and works .. I have a test page set up to check this.
      So there is something that is not identical in these pages.
      When I check them they look identical, but maybe its when the indexing is run that they aren't identical.

      Any changing content is in ZOOMSTOP tags or is written to the page using javascript document.write commands.

      Would it be ok if I PM you some links?

      Comment


      • #4
        Originally posted by boxoffice View Post
        Any changing content is in ZOOMSTOP tags or is written to the page using javascript document.write commands.
        I believe only V6 started disregarding ZOOMSTOP sections from the CRC duplicate detection. If you are still using V5, then this might be the reason.

        Originally posted by boxoffice View Post
        Would it be ok if I PM you some links?
        Sure.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X