PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Filtering double links?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering double links?

    I use Zoom 5.0 on my webshop site. But I have many double products in my index and no idea how to eliminate the "double content". With CRC or other strategies it doesnīt work because the sourrounding of the products changes.

    My idea (in theory) is to identify a part of the link urls to eliminate the double content. But I donīt know if it is possible. The problem comes up because most of the products are linked to several product categories.

    One product can exist with different link urls like this:

    a. http://mysite/index.php?product_id=12345
    b. http://mysite/index.php?product_id=1...ategory_id=789
    c. http://mysite/index.php?product_id=1...ategory_id=345
    d. http://mysite/index.php?category_id=...oduct_id=12345

    Itīs always the same product with the same title, description, image ... but in different surroundings.
    this doesnīt make it easier: not all products have all type of links a./b./c. and d. some products only have b. and d., others only have a. and so on.

    The best would be to tell zoom: if the element "product_id=12345" is included in more than one page url then leave one and delete the others.
    I know itīs possible in the "Managing existing index" menu, but it would take some hours (more than thousand products).

    Perhaps there is a possibility to do this or any other way???

    Michael

  • #2
    If you can locate certain parts of your pages to having more "valid" product links, and "less valid" product links, you can use the <!--ZOOMSTOPFOLLOW--> and <!--ZOOMRESTARTFOLLOW--> tags to exclude certain links from being followed.

    For example, if you have certain pages which act as a comprehensive listing of the products (eg. "browse_all.php?page=1", "browse_all.php?page=2", etc.), then you can depend on the links there to find all the products.

    Now elsewhere on your site, you may have many links to the same products with the various URLs you mentioned. If these links can be narrowed down to specific parts of pages: for example, you may have a "Recommended products" section on a page, or a "You may also be interested in..." section. Enclose these sections in ZOOMSTOP/ZOOMSTOPFOLLOW tags, and their links/content can be excluded from index.

    More info on ZOOMSTOP tags here:
    http://www.wrensoft.com/zoom/support....html#zoomstop
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Zoomstop

      I use already the zoomstop and zoomstart tags, but this doesnīt fix the above mentioned problem because itīs always the same template that is used for displaying the products in the described way. There is only the varying "category_id" in the link url to link the product to a special place in the sitemap.
      This means: same template, same content, different link url. I think the only thing that divide the indexed pages from each other is the link url.
      Michael

      Comment


      • #4
        I think you should consider creating a new PHP page which just prints out a list of links to all your products. This page does not need to be accessible from the rest of your website as it will be purely for spidering purposes (although it can be a site feature if you want as well). It should be relatively easy to create such a page, assuming all the products are in a database. You can then make sure that all the products here are linked in a consistent manner, eg. with only a "product_id=" parameter, and no "category_id" parameter.

        You can then just add this page as a start point for the spider (click "More" on the Spider Mode tab), and add "category_id" to your skip list. This will ensure that all your products are indexed only once.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          This seems to be a great workaround. Thanx.

          Comment

          Working...
          X