Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Duplicates not suppressed

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicates not suppressed

    I'm only recently starting to tweak and configure my site, so I hope I'm overlooking something obvious. I have enabled
    "reload all files (do not use cache)" and
    "Use CRC to skip files with identical content," but links to the same page in which only capitalization differs still show up as unique pages. I have opened each via IE, saved the source and compared using FC, which reports they are identical. They are .aspx pages, by the way, which makes me wonder whether some kind of "viewstate" tag is different, but this isn't the case when I open the pages via the browser.

    Zoom is a fantastic product, and this is a relatively minor issue, but because I have so many, I'm hoping it won't be necessary to edit links on hundreds of pages.

    Thanks for any help.

  • #2
    Filtering by CRC is an inefficient way to removing pages, as the pages need to be downloaded before the CRC can be calculated. It is better to filter on the URL using the page and folder skip list.

    So best solution is to use consistent URLs on your site.

    What are the URLs to the 2 pages, we can take a look?

    Comment


    • #3
      Examples - duplicate search results

      Thanks for your quick reply. It sounds like I need to tackle making links consistent. This is difficult, as you can imagine, though, with dozens of staff editing web pages. A great solution (in our situation) would be the option to make URL duplicate-detection case insensitive.

      In case you want to investigate why CRC isn't working, here is a case where the same page shows up 3 times:



      1. Talking Book Catalogs - Books, Magazines, and Descriptive Videos

      ... Nebraska Talking Book and Braille Service (TBBS) Online Public Access Catalog (OPAC ... : Current ONLINE edition (A comprehensive catalog of NLS and Nebraska-produced talking books, ...

      Terms matched: 2 - Score: 119 - URL: http://nlc.nebraska.gov/TBBS/TBBScatalogs.aspx


      2. Talking Book Catalogs - Books, Magazines, and Descriptive Videos

      ... Nebraska Talking Book and Braille Service (TBBS) Online Public Access Catalog (OPAC ... : Current ONLINE edition (A comprehensive catalog of NLS and Nebraska-produced talking books, ...

      Terms matched: 2 - Score: 119 - URL: http://nlc.nebraska.gov/TBBS/tbbscatalogs.aspx


      3. Talking Book Catalogs - Books, Magazines, and Descriptive Videos

      ... Nebraska Talking Book and Braille Service (TBBS) Online Public Access Catalog (OPAC ... : Current ONLINE edition (A comprehensive catalog of NLS and Nebraska-produced talking books, ...

      Terms matched: 2 - Score: 119 - URL: http://nlc.nebraska.gov/TBBS/TBBSCatalogs.aspx

      Comment


      • #4
        I found a difference on your pages, this line of HTML source code varies between the pages,

        <form name="aspnetForm" method="post" action="TBBScatalogs.aspx" id="aspnetForm">

        URLs can be case sensitive. It depends on your server. So if you moved to a new server one day many of your links would break.

        I note that you are using <!--ZOOMSTOP--> tags on your pages. If you move the position of the <!--ZOOMSTOP--> tag up a bit in your code so that it includes the varying line of <form> code, it should also fix the problem. As the CRC calculation skips sections in ZOOMSTOP tags.

        Comment


        • #5
          There will be an option to ignore upper/lowercase differences in URLs in the next major release (V7). This will be available in the next alpha (V7 Alpha 7).

          But as noted, URLs are technically case sensitive, so it's generally better web design practice to maintain case consistency with your links as moving servers (or changes on your server) may cause your links to fail in the future.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X