PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

I'm having the same problem...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I'm having the same problem...

    Hi,

    I'm using Zoom Pro. My base URL is:

    http://www.sitename.com/extranet/

    I have files (pdf's, docs., etc) located in:

    http://www.sitename.com/extranet/plu...nager/ftpfiles

    Only 3 files (pdf's) out of many are getting indexed. I don't see errors on the screen. When it hits the directory, the first line is white, listed as Index Thread got ready buffer... 2nd hit is Processing PDF (blue), 3rd line is scanning http://.... it then seems to leave the directory. Only 3 of roughly 62 files are indexed.
    Jim Blackburn

  • #2
    Are you indexing in Offline mode or Spider mode?

    I assume spider mode.

    In spider mode only files that have links pointing to them are indexed. If you have no links to a document it will not be indexed.

    ---------
    David
    Wrensoft

    Comment


    • #3
      More info

      Hi David,

      I am using Spider mode as it is a dynamic site. The files are linked via a "download manager". The link code looks like this:

      http://www.sitename.com/extranet/plu...&p13_fileid=15
      Jim Blackburn

      Comment


      • #4
        Take a look at this FAQ, and check if this explains some of the reasons why not all of your files are found:
        http://www.wrensoft.com/zoom/support...spider_finding

        Since you have a "download manager" or some sort of script that handles the linking to the PDF files, it is most likely that this is not providing spider-friendly hypertext links to the actual locations of the files. We can't tell without seeing the actual site and the links in question.

        Another problem with using such a "download manager", is that PDFs and DOCs may not be correctly recognized in their right format. This is due to the fact that the URL for the link is actually a PHP file (eg. "getfile.php"). If your download manager redirects the spider to the actual location of the file, then it will be OK. However, if it actually serves the file from the same URL, then the spider will mistakenly assume it is a PHP file.

        We plan to add features to support this in the future, but in the meantime, be aware that this style of serving documents (eg. a "getfile.php" which actually delivers the data for a PDF file) causes other issues. For example, if I click on the link in Internet Explorer, I am presented with an option to "Open" the PDF file or "Save" the file. If i click on "Open", this will fail (with a "File not found" error) after starting up Acrobat Reader due to similar issues with file type and the actual filename being different to the original URL. The same thing seems to happen when I click on the DOC files on your site. It works when the user hits "Save" and manually open the file afterwards but this isn't a safe assumption to make for most users.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Page text...

          I'm assuming at this point that it is the dynamically created download links are the problem. Given that, I'd be happy for it to pull down the text from the page that gives the file description, but it's not finding that as well:

          Page is called downloads and contains links to more pages; the links look like this:

          http://www.sitename.com/extranet/ind...13_sectionid=5

          Clicking the link brings up a page that does contain text for example, "Initial Assesment Form". If I search for this term, no results are returned. If I look at the index status, this URL is listed as "Queued URL" in yellow. I don't believe I see a matching "scanning or downloading" in greeen. I did find other DL pages in green though, which means it scanning and downloading some but not all (even though it's queued). What would cause this behavior.

          It's rather difficult to search though the main status log window. Is there a way to save this to disk or have you thought about making it searchable (or be able to copy and paste)?

          Jim
          Jim Blackburn

          Comment


          • #6
            You can save the index log to disk. Click on the "File" menu and select "Save index log to file".

            You can then open the file in a text editor and search accordingly. There should be a matching "Scanning" message for that URL, if it is queued. One possibility however, is it may hit a limit (eg. "Max unique words indexed" or "Max pages per start pt.", etc.) before it gets around to indexing that page from the queue.

            If you can't find the answer to your problem, send us a copy of the index log.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X