Results 1 to 5 of 5

Thread: Duplicates not suppressed

  1. #1
    Join Date
    Apr 2012
    Posts
    4

    Default Duplicates not suppressed

    I'm only recently starting to tweak and configure my site, so I hope I'm overlooking something obvious. I have enabled
    "reload all files (do not use cache)" and
    "Use CRC to skip files with identical content," but links to the same page in which only capitalization differs still show up as unique pages. I have opened each via IE, saved the source and compared using FC, which reports they are identical. They are .aspx pages, by the way, which makes me wonder whether some kind of "viewstate" tag is different, but this isn't the case when I open the pages via the browser.

    Zoom is a fantastic product, and this is a relatively minor issue, but because I have so many, I'm hoping it won't be necessary to edit links on hundreds of pages.

    Thanks for any help.

  2. #2
    Join Date
    Dec 2004
    Location
    Sydney
    Posts
    4,286

    Default

    Filtering by CRC is an inefficient way to removing pages, as the pages need to be downloaded before the CRC can be calculated. It is better to filter on the URL using the page and folder skip list.

    So best solution is to use consistent URLs on your site.

    What are the URLs to the 2 pages, we can take a look?

  3. #3
    Join Date
    Apr 2012
    Posts
    4

    Default Examples - duplicate search results

    Thanks for your quick reply. It sounds like I need to tackle making links consistent. This is difficult, as you can imagine, though, with dozens of staff editing web pages. A great solution (in our situation) would be the option to make URL duplicate-detection case insensitive.

    In case you want to investigate why CRC isn't working, here is a case where the same page shows up 3 times:



    1. Talking Book Catalogs - Books, Magazines, and Descriptive Videos

    ... Nebraska Talking Book and Braille Service (TBBS) Online Public Access Catalog (OPAC ... : Current ONLINE edition (A comprehensive catalog of NLS and Nebraska-produced talking books, ...

    Terms matched: 2 - Score: 119 - URL: http://nlc.nebraska.gov/TBBS/TBBScatalogs.aspx


    2. Talking Book Catalogs - Books, Magazines, and Descriptive Videos

    ... Nebraska Talking Book and Braille Service (TBBS) Online Public Access Catalog (OPAC ... : Current ONLINE edition (A comprehensive catalog of NLS and Nebraska-produced talking books, ...

    Terms matched: 2 - Score: 119 - URL: http://nlc.nebraska.gov/TBBS/tbbscatalogs.aspx


    3. Talking Book Catalogs - Books, Magazines, and Descriptive Videos

    ... Nebraska Talking Book and Braille Service (TBBS) Online Public Access Catalog (OPAC ... : Current ONLINE edition (A comprehensive catalog of NLS and Nebraska-produced talking books, ...

    Terms matched: 2 - Score: 119 - URL: http://nlc.nebraska.gov/TBBS/TBBSCatalogs.aspx

  4. #4
    Join Date
    Dec 2004
    Location
    Sydney
    Posts
    4,286

    Default

    I found a difference on your pages, this line of HTML source code varies between the pages,

    <form name="aspnetForm" method="post" action="TBBScatalogs.aspx" id="aspnetForm">

    URLs can be case sensitive. It depends on your server. So if you moved to a new server one day many of your links would break.

    I note that you are using <!--ZOOMSTOP--> tags on your pages. If you move the position of the <!--ZOOMSTOP--> tag up a bit in your code so that it includes the varying line of <form> code, it should also fix the problem. As the CRC calculation skips sections in ZOOMSTOP tags.

  5. #5
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,784

    Default

    There will be an option to ignore upper/lowercase differences in URLs in the next major release (V7). This will be available in the next alpha (V7 Alpha 7).

    But as noted, URLs are technically case sensitive, so it's generally better web design practice to maintain case consistency with your links as moving servers (or changes on your server) may cause your links to fail in the future.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •