PDA

View Full Version : duplicate results -- any way to weed them out?


webmacher
03-07-2006, 07:34 PM
Hi,

Apologies if this has been asked before, but I couldn't find it discussed... We ran a test search and found that the same page shows up repeatedly in the search results, something like this:

About Our Company
Blah blah blah blah...
URL: http://www.oursite.com/about/index.html

About Our Company
Blah blah blah blah...
URL: http://www.oursite.com/about/

About Our Company
Blah blah blah blah...
URL: http://www.oursite.com/About/

About Our Company
Blah blah blah blah...
URL: http://www.oursite.com/About/index.html

Our tech guru says that I need to go through the entire site and make sure every link is exactly the same -- that the duplicates are happening because sometimes we link directly to the index page and sometimes just point to the folder, and because the case is sometimes upper and lower and sometimes just lower.

Before I do that (and it's not going to be easy to maintain this level of consistency in the future!), I'm wondering if there's a way to change the settings in this search engine to make it case-insensitive, and also to interpret a link to a folder and a link to the index page in that folder as the same thing. Am I making sense?

wrensoft
03-07-2006, 09:37 PM
Your tech guru is correct. Linux and Unix machines are case sensitive and so this means URLs are also case sensitive.

On most web hsots. /About/ and /about/ are different folders.

it's not going to be easy to maintain this level of consistency in the future

Yes, if you are coding pages by hand, you need to be careful. If you are using a tool like Dreamweaver, then your links should match the file name every time without much effort.

Or you can do what a lot of web desigers do. Make a rule that every file and every link must have lower case names.

What you can do in Zoom is turn on CRC-32 checksum duplicate page checking (from the config window / scan options tab). This should solve some or all of your problem but will lead to slower indexing than fixing the root cause of the problem.

-------
David