PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Not all PDFs being indexed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Not all PDFs being indexed

    I have a page at http://www.noosawaters.org/docs.html with links to 48 PDF files. Only 30 of them are being downloaded and indexed however. Two files not being indexed are:
    2006 Census of Population and Housing.pdf
    Notes_on_Meeting_with_Developer_Nov07.pdf

    Rather than include the log here I've put it at
    http://www.noosawaters.org/zoom/noosawaterslog_201106061007.txt

  • #2
    We opened up a browser and went to this page:
    http://www.noosawaters.org/docs.html

    It redirected us to a login page:
    http://www.noosawaters.org/nwra_logon.html

    Evidently, that page needed authentication/login to see those links to the PDFs you mentioned.

    Looking at your log, it doesn't seem like you have set up authentication to allow this to happen with the Indexer. There are many other pages which redirect to the login page and not get indexed properly.

    Please see this FAQ for details on how to setup authentication:
    Q. How do I index protected parts of my website requiring user authentication?


    Judging from appearances, your page uses cookie-based/session authentication, so please see the relevant section under that FAQ.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Silly me. I should have figured that out for myself. I fixed it with:

      <?php
      $restricted = 1;
      if (preg_match ("/ZoomSpider/", $_SERVER[HTTP_USER_AGENT])) $restricted = "";

      if ($restricted){
      // send them to the log in page
      }
      ?>

      While I'm here may I compliment you on the way you look after your customers. If all software companies were as good as you the world would be a happier and less frustrating place.

      Comment


      • #4
        Glad you've got it working. Yes, it's often easier to just identify the spider if you have control of how the authentication is enforced. More notes on how to identify the spider can be found here.

        And thank you too for the positive feedback. Always glad to know our efforts are worthwhile!
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X