PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

If I add more sites, Zoom indexer doesn't scan all sites?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • If I add more sites, Zoom indexer doesn't scan all sites?

    I try to add a couple of domains to scan, but sometimes Zoom says "Invalid url or already scanned", but if I try to scan just that URL everything works?

  • #2
    I assume these are additional start points added by clicking on the "More" button on the Spider mode tab.

    Note that, if a previous start point has already scanned a URL, it will not be scanned again, even if you specify it as another start point. For example, if your first start point is set to "Index page and follow internal and external links", and whilst indexing that start point, it comes across a URL (eg. http://myothersite.com/index.html) and indexes it because it was set to index external links.

    Now if you have a second start point at http://myothersite.com/index.html, this will be ignored, because the page has already been indexed by the first start point. This may not be what you were after however, if you didn't want just the first page from this second start point. In which case, you can either change the setting in your first start point (so that it does not index external links) OR you can rearrange the order of your start points to prevent this from happening.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      The problem is like this, I want to index the followings sites: (only internal links)


      spider url: site1.com/list-info.php
      base url: site1.com/info/

      &

      spider url: site2.com/info/
      base url: site2.com/info/

      Then it says that that site2.com/info already have been indexed or is a invalid url, but if I try to scan only site2.com, there is no problems.

      Do you understand my problem? (sorry for my bad english)

      Comment


      • #4
        Send us your ZCFG file (via email - see our Contact Us page) and we'll take a look at it.

        It would also help if you save your index log after indexing (click on "File" -> "Save index log to file") and send us this as well. Make sure you have turned on "Verbose mode" before indexing so that we'll see all skip messages.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Got the same warming: "already scanned"!

          This is an error message i am getting:

          Additional start URL invalid or already scanned: http://sos-planet.org/documents/tanks-building/

          I have the latest version,
          php,
          using the scrip online only
          many categories
          everything neatly organized in different folders like: documents/topic1, documents/topic2 and so on

          I checked all the start indexing points and the /documents/tanks-building/ is just listed once.

          Any suggestions?

          Roger Pilon
          sos-planet.org

          NOTE - sorry wrong forum category - should had been posted in 6.0
          Last edited by Ponics; Mar-07-2010, 04:40 PM.

          Comment


          • #6
            This URL will surely have been already visited by the spider. You will find details in the log.

            If you can't find the earlier reference to this URL in the log, send us more details.

            Send us your ZCFG file (via email - see our Contact Us page) and it would also help if you save your index log after indexing (click on "File" -> "Save index log to file") and send us this as well. Make sure you have turned on "Verbose mode" before indexing so that we'll see all skip messages.

            Comment


            • #7
              Just wanted to add:

              Originally posted by Ponics View Post
              I checked all the start indexing points and the /documents/tanks-building/ is just listed once.
              Note that just checking this URL doesn't appear twice in the start point list doesn't mean it wasn't previously indexed.

              If a prior start point is allowed to crawl pages (that is, it was not set to "Index single page only"), then it is possible that a prior start point had links which allowed the spider to find this page ("/documents/tanks-building/") already.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X