PDA

View Full Version : Avoiding duplicate pages


AllanP
05-15-2008, 12:52 PM
We have multiple links to the same page on our web site which all get included in the index. :(

see http://www.southhams.gov.uk/ and search for 'listed' to see the issue

has anyone any solutions for this issue apart from manually excluding the duplicates?

Thanks in advance for any ideas

AllanP

MergeThis
05-15-2008, 07:13 PM
The "same page" is in an entirely different directory:

ksp-development_and_planning-developmentcontrol/1app-planning-application-forms.htm

1app-national-forms-submitting-planning-apps/1app-planning-application-forms.htm


Good luck,
Leon

Ray
05-16-2008, 02:25 AM
They seem to be copies of the same file placed in different folders. Perhaps there are some folders or filenames that you would want to exclude from indexing on the "Skip Options" tab.

But if these files are 100% identical, you can easily skip them all by turning on the "Duplicate page detection - Use CRC to skip files with identical content" option on the "Scan Options" tab of the Configuration window.

AllanP
05-16-2008, 09:00 AM
Thanks for the input
I should have said that the pages/links are generated from our CMS system which generates the navigation automatically so there are no 'folders' as such.
I have the Duplicate page detection on in scan options, but the CMS will display the page content in different templates depending from where in the navigation the link was followed. This results in, as far as the CRC is concerned, a different page because for example the banner may change colour and the breadcrumb will be different. I tried to exclude these differences using ZOOMSTOP and ZOOMRESTART tags
but I guess the duplication check ignores these. Oh well back to manual removal I guess :)

AllanP

Ray
05-19-2008, 01:22 AM
You would be better off getting your CMS to not generate those duplicate pages if you do not want them.

Having almost identical content with many different URLs like that (those URLs technically are presenting themselves as different folders regardless of whether they reflect true folders on the filesystem) could possibly get your site penalized on Internet wide search engines such as Google and Yahoo (as it is a similar method that many sites use to keyword spam). Zoom will not penalize this.

V6 of Zoom will have an improved "duplicate page detection" method, which will exclude ZOOMSTOP and ZOOMRESTART sections.