PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Problem with redirected sites

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with redirected sites

    Hello,

    I have a problem within the spider mode. I enter a URL as start point (setting follow internal and external links), which is redirected to another site. Due to company internal issues I cannot access the target URL directly.

    Unfortunatly the spider just index the site with the redirection but not the target site. Are there any settings to force the spider to wait for the redirection and index the target site?

    Thanks in advance for an answer.

    Jens

  • #2
    You need to change the base URL to include both the domains of the first and second sites seperated by a semi colon. If you don't do this the 2nd site will appear to the an external site and not be fully indexed. See also the User's Guide for more details about the base URL.

    Comment


    • #3
      Hi,

      thanks for your reply. Unfortunately I have still the same problem. Adding/changing the base URL has no effect.

      May be hte problem is related to the site itself I want to spider. It is a Java based portal (like this forum). When I enter the start point within the browser, it redirects me to the site I want to index, but Zoomsearch scans just the redirection site.

      Here is the log:

      02/03/10 08:15:13 - Initiating HTTP session (thread #1) ...
      02/03/10 08:15:13 - DL Thread #1, got URL (http://server.de.xyz.com:8080/pkit/go/process/element.do?elementType=Activity&elementName=Workin structions%7CEngineering%7CSW%20Engineering%7CAA%2 0ASIL%20SW%20Development%20Guidelines&projectName= Root%7CLibs%7CTransmission%20ECU&anon=1) off queue
      02/03/10 08:15:13 - Downloading file http://server.de.xyz.com:8080/pkit/go/process/element.do?elementType=Activity&elementName=Workin structions%7CEngineering%7CSW%20Engineering%7CAA%2 0ASIL%20SW%20Development%20Guidelines&projectName= Root%7CLibs%7CTransmission%20ECU&anon=1
      02/03/10 08:15:13 - Index Thread got ready buffer for http://server.de.xyz.com:8080/pkit/go/process/element.do?elementType=Activity&elementName=Workin structions%7CEngineering%7CSW%20Engineering%7CAA%2 0ASIL%20SW%20Development%20Guidelines&projectName= Root%7CLibs%7CTransmission%20ECU&anon=1 (Content-type: HTML text)
      02/03/10 08:15:13 - Spidering for links on http://server.de.xyz.com:8080/pkit/go/process/element.do?elementType=Activity&elementName=Workin structions%7CEngineering%7CSW%20Engineering%7CAA%2 0ASIL%20SW%20Development%20Guidelines&projectName= Root%7CLibs%7CTransmission%20ECU&anon=1
      02/03/10 08:15:13 - Queued URL: http://server.de.xyz.com:8080/pkit/go/process/element.do?redirected=true&elementType=Activity&el ementName=Workinstructions%7CEngineering%7CSW%20En gineering%7CAA%20ASIL%20SW%20Development%20Guideli nes&projectName=Root%7CLibs%7CTransmission%20ECU&a non=1
      02/03/10 08:15:13 - Indexing http://server.de.xyz.com:8080/pkit/go/process/element.do?elementType=Activity&elementName=Workin structions%7CEngineering%7CSW%20Engineering%7CAA%2 0ASIL%20SW%20Development%20Guidelines&projectName= Root%7CLibs%7CTransmission%20ECU&anon=1
      02/03/10 08:15:13 - DL Thread #1, got URL (http://server.de.xyz.com:8080/pkit/go/process/element.do?redirected=true&elementType=Activity&el ementName=Workinstructions%7CEngineering%7CSW%20En gineering%7CAA%20ASIL%20SW%20Development%20Guideli nes&projectName=Root%7CLibs%7CTransmission%20ECU&a non=1) off queue
      02/03/10 08:15:13 - Downloading file http://server.de.xyz.com:8080/pkit/go/process/element.do?redirected=true&elementType=Activity&el ementName=Workinstructions%7CEngineering%7CSW%20En gineering%7CAA%20ASIL%20SW%20Development%20Guideli nes&projectName=Root%7CLibs%7CTransmission%20ECU&a non=1
      02/03/10 08:15:13 - URL redirected to: http://server.de.xyz.com:8080/pkit/main.do;jsessionid=9493EC977D92DEF2DA99D6D81D28570 C [thread #1]
      02/03/10 08:15:13 - Queued URL: http://server.de.xyz.com:8080/pkit/main.do
      02/03/10 08:15:13 - DL Thread #1, got URL (http://server.de.xyz.com:8080/pkit/main.do) off queue
      02/03/10 08:15:13 - Downloading file http://server.de.xyz.com:8080/pkit/main.do
      02/03/10 08:15:13 - Index Thread got ready buffer for http://server.de.xyz.com:8080/pkit/main.do (Content-type: HTML text)
      02/03/10 08:15:13 - Spidering for links on http://server.de.xyz.com:8080/pkit/main.do
      02/03/10 08:15:13 - Skipping https://bgn.xyz.com/alias/gs (External site - does not match base URL)
      02/03/10 08:15:13 - Skipping http://server.de.xyz.com:8080/pkit/local/img/image/logo/logo.jpg (Blocked by extensions list)
      02/03/10 08:15:13 - Indexing http://server.de.xyz.com:8080/pkit/main.do
      02/03/10 08:15:13 - All index files will be written to: C:\Documents and Settings\...

      Do you have an other idea, what's wrong?

      Kind regards,

      Jens

      Comment


      • #4
        It is a Java based portal (like this forum).
        This forum is not Java based. Doesn't use any Java at all in fact.

        There is a single redirect in the log file. This one,
        Downloading http://server.de.xyz.com:8080/pkit/go/process/element.do....
        URL redirected to: http://server.de.xyz.com:8080/pkit/main.do...
        Indexing http://server.de.xyz.com:8080/pkit/main.do...

        This is a redirect within the same site (your site I assume). So I don't understand your description of the issue where you talk about 2 sites. There is only 1 site.

        Further you stated that Zoom is indexing the 1st site and not the target site, but the log shows Zoom indexing the target of the re-direct, as it should.

        So I really don't understand the problem you have.

        Due to company internal issues I cannot access the target URL directly
        Why? You clearly can access the target site, as you stated you can see it in a browser window.

        Comment

        Working...
        X