PDA

View Full Version : Same page showing multiple times with double slash


len smith
07-19-2007, 07:27 PM
I wonder if anyone can help solve my mystery as to why, when a search word is entered in my search i.e. Vwadyck Zendera (I have used this as it is an uncommon name shown on the site) it shows the same page 3 times. It happens with all searches
The Url for my site is worldwar2exraf.co.uk
Would appreciate some help
Thanks
Len Smith

WizardFusion
07-19-2007, 08:41 PM
The URLs of your results give...

http://www.worldwar2exraf.co.uk/Ground%20Crew%20Notice%20Board/Page%2036.htm
http://www.worldwar2exraf.co.uk//Ground%20Crew%20Notice%20Board/Page%2036.htm
http://www.worldwar2exraf.co.uk///Ground%20Crew%20Notice%20Board/Page%2036.htm
http://www.worldwar2exraf.co.uk////Ground%20Crew%20Notice%20Board/Page%2036.htm

Notice the extra slashes. Do you have multipule copies of the same file.?

len smith
07-19-2007, 08:57 PM
No there is only file for each page. I noticed the extra slashes too and checked to make sure but everything is as it should be. That's what I can't understand, no matter what yu enter on any page it still shows the same page 3 times. Very strange !!!!!!
Regards
Len Smith

wrensoft
07-19-2007, 10:42 PM
As the URLs are different they are treated as different pages.

The problem is that the same page is being index multiple times at different URLs. This, in turn, is a result of you having a (slightly) broken link on your web site. You probably have a relative link somewhere on your site that has a double slash in the URL. Then on the page with a double slash it links to a page with a triple slash. An infinite loop.

So the possible solutions are,

1) Fix the broken link, although finding it might take a small amount of detective work.

2) Turn on CRC duplicate page checking in Zoom.

Option 1) is a much better solution than 2) as option 2) needs to download the page before it can work out it is a duplicate, plus option 2) might fail in any case if some of your content is dynamically changing.

If you can't find the broken link, let us know, we might be able to find it for you if you site is not too large.

Ray
07-20-2007, 02:37 AM
I recently helped another user with a similar problem, so I'll post my tips from my original e-mail below. This should give you (and anyone else who is having this problem) a guide as to how you can use Zoom to find your broken links, and in this particular case, find the links with the extra slash that is making your website spider unfriendly.

Extract from my e-mail below:


... in situations like this, you can configure Zoom to help you
locate these problems in your website. This is what you do:

On the "General" tab of the Configuration window, set Zoom to
"Single-threaded downloading". This helps make things alot clearer
as to which link came from where in your log.
If you are making changes as you go, it probably helps to check the
"Reload all files (do not use cache)" option to avoid indexing cached pages.
On the "Index Log" tab, make sure you have the following boxes enabled:
Indexing, Spidering, Initialization, Downloading, Information, Error,
Warning, Plugin, Summary, Broken Links.
Turn on "Save index log to file" and specify a place for the log file to
be written to.
Enable "Debug mode" so that the log file will be written out as indexing
goes.Now when you re-index the site, a log text file will be created with all the
index messages. Once it gets to a point where the looping occurs, you can
stop it, and open the log file in Notepad or any text editor. This will
allow you to browse through the log in more detail.
You will find "Queued URL: ..." messages in the log which immediately
follows the page that it was spidered from ("Spidering for ..."). So, in
doing the above, I have found the following as the first occurance of a URL
containing ".uk//":

07/05/07 10:39:24 - Spidering for links on
http://www.mysite.co.uk/main/cycleracks_rear2-3.htm
07/05/07 10:39:24 - Queued URL:
http://www.mysite.co.uk/main/euroclassic_mof.html
07/05/07 10:39:24 - Queued URL:
http://www.mysite.co.uk//main/backup_box.htm
07/05/07 10:39:24 - Queued URL:
http://www.mysite.co.uk/main/euroway_mof.html

As we can see here, this double slashed link was found on the
cycleracks_rear2-3.htm page. And going to that page and looking for that
link, indeed, we see this in the HTML source:

903 can also be used to carry the Thule <a
href="..//main/backup_box.htm">BackUp
luggage box</a><font face="Arial, Helvetica, sans-serif">. <br />


So this is one of the problems to fix. It might even be the cause of all the
other links, because what happens is that, the spider will proceed to that
URL later on, and your server will respond as if it was a unique page:

07/05/07 10:40:32 - Downloading file
http://www.mysite.co.uk//main/backup_box.htm
07/05/07 10:40:33 - Spidering for links on
http://www.mysite.co.uk//main/backup_box.htm
07/05/07 10:40:33 - Queued URL:
http://www.mysite.co.uk//main/specials.htm
07/05/07 10:40:33 - Queued URL:
http://www.mysite.co.uk//main/cycleracks_rear2-3.htm

Now because of the relative links on that page (links to "specials.htm"
etc.) it will think this is a new path and all these links are unique, and
cause a cascading effect - where the entire website is re-indexed with two
slashes - which will lead to then indexing with 3 slashes, etc.

I can't confirm that this is the only occurance of the double slash on your
site without modifying the page on your website and repeating the above
process. So I would recommend you do that, and see how you go from there.

len smith
07-20-2007, 02:40 PM
Thanks for that Guys.
Well I have checked and rechecked and although everything points to several broken links when I look at the links shown they all seem to be working perfectly with no problems at all. I am wondering if it is because I use Pop menu magic for my navigation as the pages that show as broken links are the ones that use this navigation system.
All very starnge indeed.
Any help orsolutions would be appreciated paid or otherwise
regards
len smith

len smith
07-20-2007, 02:57 PM
As the URLs are different they are treated as different pages.

The problem is that the same page is being index multiple times at different URLs. This, in turn, is a result of you having a (slightly) broken link on your web site. You probably have a relative link somewhere on your site that has a double slash in the URL. Then on the page with a double slash it links to a page with a triple slash. An infinite loop.

So the possible solutions are,

1) Fix the broken link, although finding it might take a small amount of detective work.

2) Turn on CRC duplicate page checking in Zoom.

Option 1) is a much better solution than 2) as option 2) needs to download the page before it can work out it is a duplicate, plus option 2) might fail in any case if some of your content is dynamically changing.

If you can't find the broken link, let us know, we might be able to find it for you if you site is not too large.

Now if I had tried your option 2 in the first place (Turn on CRC) I could have saved myself endless hours of seraching. Have just tried this option and guess what? It works fine now.
Thanks once again Regards
Len Smith

wrensoft
07-21-2007, 08:47 AM
I had a look at your site. You have 1000's of pages on the site, so it was a little like looking for a needle in a haystack.

Nevertheless I found at least one (partially) broken link which would cause this problem.

On this page,
http://www.worldwar2exraf.co.uk/Photo%20Gallery%202207/Aircrew%202007/aircrew%20single%206.html

You have this HTML code,

<a href="../..//Aircrew Notice Board/aircrew notice board 168.html#1608">View details on Notice Board </a>

As expected it was a relative link with a double slash in the link.

len smith
07-21-2007, 01:41 PM
Thanks for that. yep there are approximatly 15,000 + pages on the site as growing all the time due to the amount of people who use the site for posting details. As you say like looking for a needle in a haystack.
thanks for finding that page.
Have to say though your sencod suggestion of turning on CRC seeming to work just fine for the site.
Thanks for that. It really pulled me out of a hole
Best Regards
Len

len smith
07-21-2007, 03:39 PM
Hi to Wrensoft
It seems there were a few more links like that on the same page you mentioned which I have now put right. I am wonering how many more I have missed. The silly thing is the links still worked, too.
I do appreciate the time you took over this, I see you spent well over an hour on the site looking for the problem.

Thanks for that.
Regards
Len Smith

Ray
07-25-2007, 07:45 AM
Seeing how difficult it is to hunt for such problems on your website (and the potential trouble you will have when indexing a site with such problems), we decided to add some functionality in the latest build of Zoom which will strip out the multiple slashes in URLs, and thus prevent duplicate pages being indexed because of this.

You can download the latest build with this new feature (V5.1.1003) here:
http://www.wrensoft.com/zoom/whatsnew.html

We should note however, that we still recommend website designers to avoid such linking issues and fix any existing problems. They will cause similar issues with other search engines and may potentially render your website search engine unfriendly (a little while back, Google was also showing pages with multiple slashes in its URLs and it would rank down a site because of this, thinking it had too many duplicate pages).

len smith
07-27-2007, 02:17 PM
Thanks for that Ray
after this reply I will be downloading the latest build.
What stars you are at Wrensoft.
My problem is that not only have I thousands of pages but also each page contains up to 100 links too (sometimes more in a lot of cases)
Thankfully we rate very high with google so broken links don't seem to be too much of a problem with the site.
I do appreciate the time you have spent on my site trying to help
Regards
Len Smith

len smith
07-29-2007, 02:49 PM
works like a dream
thanks for that
Len Smith