I have a page at http://www.noosawaters.org/docs.html with links to 48 PDF files. Only 30 of them are being downloaded and indexed however. Two files not being indexed are:
2006 Census of Population and Housing.pdf
Notes_on_Meeting_with_Developer_Nov07.pdf
Rather than include the log here I've put it at
http://www.noosawaters.org/zoom/noosawaterslog_201106061007.txt
We opened up a browser and went to this page:
http://www.noosawaters.org/docs.html
It redirected us to a login page:
http://www.noosawaters.org/nwra_logon.html
Evidently, that page needed authentication/login to see those links to the PDFs you mentioned.
Looking at your log, it doesn't seem like you have set up authentication to allow this to happen with the Indexer. There are many other pages which redirect to the login page and not get indexed properly.
Please see this FAQ for details on how to setup authentication:
Q. How do I index protected parts of my website requiring user authentication?
Judging from appearances, your page uses cookie-based/session authentication, so please see the relevant section under that FAQ.
Silly me. I should have figured that out for myself. I fixed it with:
<?php
$restricted = 1;
if (preg_match ("/ZoomSpider/", $_SERVER[HTTP_USER_AGENT])) $restricted = "";
if ($restricted){
// send them to the log in page
}
?>
While I'm here may I compliment you on the way you look after your customers. If all software companies were as good as you the world would be a happier and less frustrating place.
Glad you've got it working. Yes, it's often easier to just identify the spider if you have control of how the authentication is enforced. More notes on how to identify the spider can be found here.
And thank you too for the positive feedback. Always glad to know our efforts are worthwhile!