PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Will the spider crawl this link?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Will the spider crawl this link?

    I have a little php interface for my website and the files are stacked like this:

    mywebsite/userfolder/files/2011/html/file1.html etc.

    There are no indexes in:
    /files/
    /2011/
    /html/

    There is an index.php at:
    mywebsite/userfolder/index.php

    I guess my question is: Can the spider crawl directories with no index?

    TIA -- Jae

  • #2
    If your PHP interface (and the rest of your pages) have hypertext links that can be followed to find all the files in those folders, then they will be indexed (using Spider Mode).

    Spider mode follows links on each page, just make sure your base URL setting and skip options permit the necessary links to be followed.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      I got a fairly good index on my first try, but nothing from php pages. I'm trying it again with the default index mode. (See... I try to get fancy...)

      But, I am concerned about what I see when I follow the Spider on the control panel: Is it actually downloading files? Surely not. It's not creating a mirror on my computer, is it? Maybe just downloading link info?

      TIA - Jae

      Comment


      • #4
        The spider mode downloads pages as any browser downloads a page when a user visits a page on your web site.

        Some people mistakenly think a page is only considered to be "downloaded" when you right click and select "Download and save to disk". But reality is that a page is ALWAYS downloaded from a web server when you view the page -- the difference in those two options is whether the page is saved or not.

        So no, it is not creating a mirror on your computer, since the pages downloaded are not saved. However, it would be the same as you opening your browser, and clicking on every link of your web site.

        With spider mode, this is the only way you can crawl dynamically generated PHP pages (which are scripts that must be processed and served by the web server).

        If you wish to avoid using internet traffic, you would have to index locally. You can do this with Offline Mode (indexing files on your hard drive) but then you won't be able to index dynamically generated pages like PHP. Otherwise, you would have to index on the web server itself or on a computer that is on a local network to the web server (or host your own web server so you can do this on your test server and use the rewrite index option to suit the live server).
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X