Go Back   Wrensoft Forums for Zoom Search > Zoom Search Engine V5 (Old Version)

Reply
 
Thread Tools Display Modes
  #1  
Old 12-05-2006, 01:08 PM
Mantis Mantis is offline
Junior Member
 
Posts: 9
Default //index.php - Site indexed twice with double slashes

Hi.

I just purchased Zoom Search Engine and absolutely love it!

One thing I noticed, though is, that the indexer indexes all my pages twice, one time http://www.mydomain.com/index.php?id=xx and one time http://www.mydomain.com//index.php?id=xx. I am using the Etomite CMS for my website.

On one hand I'd like to know, why the indexer scans my sites twice and on the other hand, I'd prefer not to have my sites indexed as //index.php pages (uniformity of the links).

Any ideas?

Thanks in advance!
Reply With Quote
  #2  
Old 12-05-2006, 06:35 PM
wrensoft's Avatar
wrensoft wrensoft is offline
Administrator
 
Location: Sydney
Posts: 3,369
Default

Zoom follows all the links it can find. Somewhere on your site you probably have a bad link. Maybe just a typo. This bad link will have a double // in it. It might even be a bug in the Etomite CMS software.

Once the bad link is encountered, it is followed, and lots of new links will then be generated by your site with double slashes. In the end your entire site will be indexed twice.

We have even seen cases where the site is a infinite loop.
First pass,
http://www.mydomain.com/index.php
2nd pass,
http://www.mydomain.com//index.php
3rd pass,
http://www.mydomain.com///index.php
....
10th pass,
http://www.mydomain.com//////////index.php

The are two solutions,

1) Turn on full logging in Zoom and find the bad link by looking through the log. (this is the best solution)

2) Hide the problem, by adding
.com//
to your page skip list in Zoom.
Reply With Quote
  #3  
Old 12-06-2006, 11:17 AM
Mantis Mantis is offline
Junior Member
 
Posts: 9
Default

Apparantly, selecting "use CRC to skip files with identical content" also did the trick.

Thanks anyway.
Reply With Quote
  #4  
Old 12-06-2006, 07:02 PM
wrensoft's Avatar
wrensoft wrensoft is offline
Administrator
 
Location: Sydney
Posts: 3,369
Default

Yes, that would also fix the problem. I didn't suggest that because it is an inefficient solution. Firstly because you are only hiding the problem rather than fixing it. Secondly because the CRC filtering can only happen after page is downloaded. So you are still downloading all pages twice.

Filtering on the URL via the skip list is a more efficient way of hiding the problem.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 01:10 PM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.