Results 1 to 4 of 4

Thread: Not all PDFs being indexed

  1. #1
    Join Date
    Aug 2010
    Location
    Queensland, Australia
    Posts
    6

    Default Not all PDFs being indexed

    I have a page at http://www.noosawaters.org/docs.html with links to 48 PDF files. Only 30 of them are being downloaded and indexed however. Two files not being indexed are:
    2006 Census of Population and Housing.pdf
    Notes_on_Meeting_with_Developer_Nov07.pdf

    Rather than include the log here I've put it at
    http://www.noosawaters.org/zoom/noosawaterslog_201106061007.txt

  2. #2
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,586

    Default

    We opened up a browser and went to this page:
    http://www.noosawaters.org/docs.html

    It redirected us to a login page:
    http://www.noosawaters.org/nwra_logon.html

    Evidently, that page needed authentication/login to see those links to the PDFs you mentioned.

    Looking at your log, it doesn't seem like you have set up authentication to allow this to happen with the Indexer. There are many other pages which redirect to the login page and not get indexed properly.

    Please see this FAQ for details on how to setup authentication:
    Q. How do I index protected parts of my website requiring user authentication?


    Judging from appearances, your page uses cookie-based/session authentication, so please see the relevant section under that FAQ.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

  3. #3
    Join Date
    Aug 2010
    Location
    Queensland, Australia
    Posts
    6

    Default

    Silly me. I should have figured that out for myself. I fixed it with:

    <?php
    $restricted = 1;
    if (preg_match ("/ZoomSpider/", $_SERVER[HTTP_USER_AGENT])) $restricted = "";

    if ($restricted){
    // send them to the log in page
    }
    ?>

    While I'm here may I compliment you on the way you look after your customers. If all software companies were as good as you the world would be a happier and less frustrating place.

  4. #4
    Join Date
    Dec 2004
    Location
    Sydney, Australia
    Posts
    3,586

    Default

    Glad you've got it working. Yes, it's often easier to just identify the spider if you have control of how the authentication is enforced. More notes on how to identify the spider can be found here.

    And thank you too for the positive feedback. Always glad to know our efforts are worthwhile!
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •