How to index sites requiring authentication with Zoom

Q. I can't get authentication to work for spider indexing my site.
Q. How do I index protected parts of my website requiring user authentication?

Check whether your site uses HTTP authentication or cookie-based authentication. Zoom can provide automatic authentication for the former (HTTP authentication), but will require special methods to access websites using the latter (cookie-based authentication).

HTTP authentication

HTTP authentication usually appears as a special login window (when you access the page in your browser) and is a standardised method of authenticating over HTTP, implemented by the web server.

Example 1. A typical website with HTTP authentication

If your website uses HTTP authentication, you can simply enter your login information into Zoom (under the "Authentication" tab of the Configuration window) and the spider will automatically login when required and successfully index the protected parts of your website.

Cookie-based or session-based authentication

Cookie-based authentication however, usually appears as a form on a page, and is implemented by server-side scripts (such as PHP or ASP or Cold Fusion). Because there is no standard method as to how this can be implemented, Zoom is unable to automatically login to access the protected web page. However, there are alternative methods to bypass this.

Example 2. A typical website with cookie-based (or session-based) authentication

If your website uses cookie or session-based authentication, try the following:

  1. You can login to the site via Internet Explorer, then immediately afterwards (do not close IE), start indexing from Zoom (making sure it starts spidering from a page within the site rather than visiting the login page again). The cookie set in Internet Explorer should carry across to Zoom (make sure to check the option "Use cookies from Windows and IE" under the "Authentication" tab of the Configuration window). Note that this method will not work with per session cookies (see notes below).
  2. If your login page can receive username and password information via the URL, then you can use a spider start point / URL with this information specified as GET parameters (for example, "http://www.mysite.com/login.asp?username=george&password=ringo").
  3. If you can modify the server-side script that does the authentication, you could change it so that it allows a user-agent containing the word "ZoomSpider" to bypass the login process. Similarly, you could also allow the IP address of the indexing computer to bypass the login process.
  4. If possible, consider using Offline mode to index your website. This requires a copy of the website to be accessible on your local hard disk, allowing Zoom to simply scan all the files without having to get pass the security restrictions on your live site. Note however that offline mode is not suited for websites which depend heavily on server-side scripting to deliver content (eg. PHP or ASP driven websites). See the Users Guide for more information on Spider mode and Offline mode.

Important: If you are using one of the above methods to allow the spider to login to your cookie or session-based authenticated site, you need to make sure that the spider does not follow a link to the "logout" page, subsequently logging itself out of your website. You can prevent this by simply specifying the logout page in the "Skip pages and folder list" (in the Configuration window, under the "Skip options" tab), eg. "logout.asp" or "&logout=1", etc.

Notes regarding persistent and session cookies

If your website uses cookies for authentication, you should check whether the cookies are persistent or session based.

Persistent cookies are stored for a specified length of time. These cookies can retain information between visits to a site, and is typically implemented with a "Remember my login information" option on your login page.

Session cookies are used to only store information within a session or single browser window. These cookies will be deleted and invalid when a session is terminated (eg. when you close your browser window). If your site uses session cookies, note that some of the methods listed above (namely #1) will not work.

Return to the Zoom Search Engine Support page