PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Problem when Indexing in Spider mode

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem when Indexing in Spider mode

    Hi,

    I am trying to index my website divinewellness.com, but I am getting the following error "Check that the URL exists and satisfies the settings in the configuration window".
    The URL i am giving for indexing is working properly if i am opening it in web browser.
    URL I am giving: divinewellness.com
    Please help me.


    Thanks

  • #2
    Please check the Log tab, and make sure to enable the displaying of all messages.

    You most likely have a Skipped message explaining what was skipped. When I tested this quickly I got this:

    18:20:00 - [DOWNLOAD] Downloading file http://divinewellness.com/
    18:20:00 - [DOWNLOAD] URL redirected to: http://www.divinewellness.com/ [thread #1]
    18:20:00 - [SKIPPED] Skipping http://www.divinewellness.com/ (External site - does not match base URL)
    18:20:07 - [ERROR] No files found to spider from http://divinewellness.com/
    18:20:07 - Indexing failed
    Because of your redirection, the base URL will not allow this by default. This is to prevent indexing across many sub-sites if they belong to the same domain name.

    You can change this behaviour by changing the Base URL setting (see the Users Guide on how to specify multiple base URLs)

    By the easiest way to address this would be to just use the same URL as you are redirecting to, that is, change your Start Spider URL to:
    http://www.divinewellness.com/

    This will also avoid all the issues with duplicate URLs and having some results go to the "www." domain and some without, etc.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      In my website URL rewriting is used, that means Zoom will not work for my website? Every page URL has been rewritten so I cann't give redirected URL at the time of indexing.

      Comment


      • #4
        Did you try what Ray suggested? It should solve your problem.

        Comment


        • #5
          ya, but what i am saying that url rewriting has been used in the website, so i can not give redirected url

          Comment


          • #6
            ...so i can not give redirected url
            I don't know what you are talking about. We already told you what your site is doing in the log above.

            http://divinewellness.com/ redirects to http://www.divinewellness.com/

            So as Ray says, the easiest way to address this would be to just use the same URL as you are redirecting to, that is, change your Start Spider URL to:
            http://www.divinewellness.com/

            Comment


            • #7
              yes I have tried this also, I have changed
              http://divinewellness.com/ to http://www.divinewellness.com/
              as my spider start URL. But its giving the same error message as
              "Check that the URL exists and satisfies the settings in the configuration window"

              Comment


              • #8
                Works fine for me. Can you post your log.

                Here is the start of the log I get,
                21:59:26 - Spider from: http://www.divinewellness.com/
                21:59:26 - Web site URL: http://www.divinewellness.com/
                21:59:26 - Estimated RAM required during index process: 125212 KB
                21:59:27 - [DOWNLOAD] Downloading robots.txt file found at http://www.divinewellness.com/robots.txt
                21:59:28 - Initiating HTTP session (thread #1) ...
                21:59:28 - [DOWNLOAD] Downloading file http://www.divinewellness.com/
                21:59:31 - Initiating HTTP session (thread #2) ...
                21:59:32 - [DOWNLOAD] Downloading file http://www.divinewellness.com/index.aspx
                21:59:32 - [DOWNLOAD] Downloading file http://www.divinewellness.com/Register.aspx
                21:59:32 - [INDEXED] Indexing http://www.divinewellness.com/
                21:59:34 - [DOWNLOAD] Downloading file http://www.divinewellness.com/Yoga-Section/1/Yoga.htm
                21:59:34 - [INDEXED] Indexing http://www.divinewellness.com/index.aspx
                21:59:37 - [DOWNLOAD] Downloading file http://www.divinewellness.com/yoga-category/1/introduction-to-yoga.htm
                21:59:37 - [DOWNLOAD] Downloading file http://www.divinewellness.com/yoga-category/129/philosophy-of-yoga.htm
                21:59:37 - [INDEXED] Indexing http://www.divinewellness.com/Register.aspx
                21:59:38 - [INDEXED] Indexing http://www.divinewellness.com/Yoga-Section/1/Yoga.htm

                Comment


                • #9
                  Here is my log

                  16:20:08 - Start indexing (spider mode) at Tue Oct 19 16:20:07 2010
                  16:20:16 - [ERROR] No files found to spider from http://www.divinewellness.com/
                  16:20:54 - Start indexing (spider mode) at Tue Oct 19 16:20:54 2010
                  16:20:58 - [ERROR] No files found to spider from http://www.divinewellness.com/
                  16:42:42 - Start indexing (spider mode) at Tue Oct 19 16:42:42 2010
                  16:42:46 - [ERROR] No files found to spider from http://www.divinewellness.com/

                  Comment


                  • #10
                    I think you have only posted a very small portion of the log. Can you click on the "Show all" button in the log window and see if you see more details. You can then right click in the log window to Copy / Paste the log file.

                    Comment


                    • #11
                      Here is my complete log after I clicked show all


                      11:19:26 - Start indexing (spider mode) at Wed Oct 20 11:19:26 2010
                      11:19:26 - Maximum number of words: 15000
                      11:19:26 - Maximum number of files: 50
                      11:19:26 - Will scan files with extensions
                      11:19:26 - .htm
                      11:19:26 - .html
                      11:19:26 - .txt
                      11:19:26 - .php
                      11:19:26 - .asp
                      11:19:26 - .cgi
                      11:19:26 - .aspx
                      11:19:26 - .pl
                      11:19:26 - .php3
                      11:19:26 - Spider from: http://www.divinewellness.com/
                      11:19:26 - Web site URL: http://www.divinewellness.com/
                      11:19:26 - Estimated RAM required during index process: 5670 KB
                      11:19:27 - [DOWNLOAD] Downloading robots.txt file found at http://www.divinewellness.com/robots.txt
                      11:19:27 - Initiating HTTP session (thread #1) ...
                      11:19:27 - DL Thread #1, got URL (http://www.divinewellness.com/) off queue
                      11:19:27 - [DOWNLOAD] Downloading file http://www.divinewellness.com/ (101454 bytes)
                      11:19:29 - [WARNING] Could not download file: http://www.divinewellness.com/ (File size limit exceeded)
                      11:19:29 - Initiating HTTP session (thread #2) ...
                      11:19:30 - [ERROR] No files found to spider from http://www.divinewellness.com/
                      11:19:30 - Indexing failed
                      11:19:30 - Waiting for threads to finish ...
                      11:19:30 - Cleaning up memory used for index data... please wait.
                      11:19:30 - Finished cleaning up memory.

                      Comment


                      • #12
                        Did you see the red Warning message?

                        Originally posted by ratripathi View Post
                        11:19:27 - [DOWNLOAD] Downloading file http://www.divinewellness.com/ (101454 bytes)
                        11:19:29 - [WARNING] Could not download file: http://www.divinewellness.com/ (File size limit exceeded)
                        This indicates that the file is over the max file size limit. This is specified under "Configure"->"Limits" in the Professional and Enterprise Edition. Zoom cannot continue if the first page cannot be indexed.

                        If you are using the Free Edition, you cannot index pages which are over 100,000 bytes in size (each).

                        If you have another page on the site which is smaller and contains links to the rest of your site, you may get further with using it as the start URL. But it still means alot of the other pages on your site will be skipped.

                        On closer inspection, your webpages are larger than most sites because you have embedded a 45KB VIEWSTATE on every page.

                        This would not be an issue with the registered editions of the software.

                        More information on the editions here:
                        http://www.wrensoft.com/zoom/editions.html
                        --Ray
                        Wrensoft Web Software
                        Sydney, Australia
                        Zoom Search Engine

                        Comment

                        Working...
                        X