Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

using zoom to index abantecart ecommerce website.

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • using zoom to index abantecart ecommerce website.

    Hello
    I have multiple abante carts and wanted to index only the products pages on each site thus creating a central search page.

    the issue i have here is that the product pages can be viewed by using 4 different links.

    http://my-cart.co.uk/nikkismilitaria/index.php?rt=product/product&product_id=744
    http://my-cart.co.uk/nikkismilitaria/index.php?rt=product/product&path=65&product_id=744
    http://my-cart.co.uk/nikkismilitaria/index.php?rt=product/product&product_id=744&currency=GBP
    http://my-cart.co.uk/nikkismilitaria/index.php?rt=product/product&product_id=744&language=en

    so when i do the scan i get 4 results back for the same page.
    I have tried several ways,

    page and folder skip list:

    &path=
    &language=
    &currency=

    but then zoom tells me i have only 24 pages indexed, i know there to be over 100 products

    i also tried adding text to only the products pages and then adding content filtering

    +indexme
    I have tried V6 and V7 with the same results?

    How can i spider every page but only index this format only:

    http://my-cart.co.uk/nikkismilitaria/index.php?rt=product/product&product_id=#######

    where ####### = item number?

    Many thanks
    Carl

  • #2
    Originally posted by 4thstar View Post
    How can i spider every page but only index this format only:

    http://my-cart.co.uk/nikkismilitaria/index.php?rt=product/product&product_id=#######

    where ####### = item number?
    Well your page and folder skip list achieves this, but the problem is that your site very likely does not have links to all the product items in this format only. So some products are only linked to via an URL with "&path=" parameter, or one of the other parameters.

    The "+indexme" content filter will not help the spider find pages that it was never linked to. So that won't work for this purpose.

    I would suggest undoing the previous aforementioned attempts, then try one of the following:

    (1) Enable duplicate page detection (under "Configure"->"Scan options"->"Use CRC to skip files with identical content"). Note however that the URLs (the "4 different links" mentioned above) must actually generate IDENTICAL pages for this to work. If there is any dynamic part of the page, such as Google ads, or "Today's date", etc. then you will need to wrap these parts of the page with <!--ZOOMSTOP--> and <!--ZOOMRESTART--> tags so they can be excluded from the CRC.

    (2) Find a page (or create a PHP page) which lists links for all the products via the required format only. Use this as your start spider URL and maybe even limit the indexing mode if you don't want any other links (click on the "More" button next to the start URL, click "Edit", select "Follow all links on this page only").

    (3) Come up with a complete list of the product URLs and import them in as individual start points into Zoom (click "More"->"Import")
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X