Results 1 to 4 of 4

Thread: facebook iframe - zoom search?

  1. #1
    Join Date
    Aug 2009
    Posts
    35

    Default facebook iframe - zoom search?

    Hi guys

    We are getting duplicates in our zoom search results.
    The urls are almost exactly the same -
    http://www.example.com/test.htm
    and
    http://www.example.com/test.htm/

    When I compare the source code of both pages, they are identical except for
    the url (as above) and the facebook iframe which differs in its link (whether the trailing slash appears.) EG

    <iframe id="facebook_like" src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.example.com/test.htm/" ></iframe>

    So I wrapped the iframe in zoom tags Eg

    <!--ZOOMSTOP-->
    <!--ZOOMSTOPFOLLOW-->
    <iframe id="facebook_like" src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.example.com/test.htm/" ></iframe>
    <!--ZOOMRESTARTFOLLOW-->
    <!--ZOOMRESTART-->

    Is this enough to make sure zoom doesn't open the iframe content?
    Shouldn't zoom disregard the iframe content since it is a different domain (facebook.com)?
    What else can I do?

  2. #2
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,826

    Default

    Several things of note:

    1) Zoom would not be following that facebook.com link unless you:
    (a) have multiple start points, one of which is facebook.com
    (b) have multiple base URLs for the start point, which allows facebook.com to be considered part of the same start point.
    (c) have set the spidering options for the start point to "Index page and follow internal and external links" (after clicking on "More"->"Edit")

    If none of the above is the case, then I'd suspect there's another link somewhere on your site which is going to that URL, rather than the facebook link.

    2) Technically, the following are two very different URLs:
    http://www.example.com/test.htm
    http://www.example.com/test.htm/

    The latter is in fact, a directory named "test.htm". However, you can configure a web server to rewrite URLs and automatically attempt to find a matching filename, ignoring the fact that a folder was actually requested. When this happens, the server is simply compensating and doing this while the client (i.e. the browser or in this case, the spider) is none the wiser and is given no clue that this was treated as the same URL.

    Having said that, what Zoom can do is look at the page content and decide if it is truly a duplicate page, and reject it if so. This setting can be found under "Configure"->"Scan options"->"Use CRC to skip files with identical content".

    Note that the page must be completely identical to work, so if it has something which is dynamic (e.g. the current date and time is printed at the top of the page, or it contains advertising), then it will not be recognized as being identical.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

  3. #3
    Join Date
    Aug 2009
    Posts
    35

    Default

    Quote Originally Posted by Ray View Post
    Having said that, what Zoom can do is look at the page content and decide if it is truly a duplicate page, and reject it if so. This setting can be found under "Configure"->"Scan options"->"Use CRC to skip files with identical content".

    Note that the page must be completely identical to work, so if it has something which is dynamic (e.g. the current date and time is printed at the top of the page, or it contains advertising), then it will not be recognized as being identical.
    Thanks for the quick response Ray.
    "Use CRC" is selected and works .. I have a test page set up to check this.
    So there is something that is not identical in these pages.
    When I check them they look identical, but maybe its when the indexing is run that they aren't identical.

    Any changing content is in ZOOMSTOP tags or is written to the page using javascript document.write commands.

    Would it be ok if I PM you some links?

  4. #4
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,826

    Default

    Quote Originally Posted by boxoffice View Post
    Any changing content is in ZOOMSTOP tags or is written to the page using javascript document.write commands.
    I believe only V6 started disregarding ZOOMSTOP sections from the CRC duplicate detection. If you are still using V5, then this might be the reason.

    Quote Originally Posted by boxoffice View Post
    Would it be ok if I PM you some links?
    Sure.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •