Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

PHPBB Attachments

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PHPBB Attachments

    I am using version 6.0 Pro on my company intranet.

    All is well in that I can spider PHPBB3's forums and topics except for attachments.

    The detailed log shows that the attachment's file/link is being spidered and indexed.

    http://.../forum/download/file.php?id=2?sid=...

    However, the plugin portion of the log does not show any PHPBB attachments being processed even though they appear in both the spider and index logs.

    All plugins are installed and is successfully processing PDFs and Word docs in other areas of the intranet. But none of the file extensions are being processed in PHPBB even though the logs show that the links are being spidered and indexed.

    I do not have 'download' or 'file.php' in any filter or skip options.

    Any ideas?

  • #2
    If the download PHP pages are serving the PDF or Word files, then Zoom would be able to index them.

    However, many bulletin boards have a feature where attachments are not accessible unless you are logged in. I would suspect that your spider is not configured to log in to the forum, and thus phpBB is not serving the attachment files through these links. Instead, it is most likely serving a page which says "You need to log in to download the file". This would be why the download links appear to be indexed but no plugin processing was required.

    You should be able to verify this by first of all trying to access the download links from the browser without logging in first. Second, you might want to try it with a "wget" or similar tool which would not identify itself as a browser. In case phpbb is behaving differently depending on whether it is accessed by IE, Firefox, vs. a search engine spider.

    If you can also give us the actual log messages for the URLs in question, it might give us more information. Enable Skip messages, in case the files are actually being skipped with a given reason.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      The attachments can be accessed by the forum guest and without having to login.

      I tried wget and it seems to be able to access the attachment as well.

      There is no record in the log of any attachments being skipped.

      --

      Let me ask this.

      I noticed that where PHPBB stores its files (filesystem and not the database), it "renames" the files something similar to:

      from: test.doc

      to: 2_B0dcgdurtnhgyrncljdf

      Note that there is no file extension in the renaming.

      Could this be why Zoom is not able to index PHPBB's attachments?



      PHPBB's /download/file.php translates the renamed file back to its original name and then sends the file to the browser.

      Comment


      • #4
        If the forum is accessible online, can you give us the URL so we can take a look for ourselves. Give specific examples of attachments that you are testing with.

        Better yet, e-mail us your ZCFG file so we can be sure to use the same settings you are.

        The way phpBB stores the files in its filesystem shouldn't have anything to do with it. So long as phpBB sends the file across HTTP and provide the correct content-type, it should be picked up.

        Here's something else to check. On the "Scan options" panel of the Configure tab in Zoom, do you have the ".pdf" extension set to "Acrobat document" file type, and ".doc" set to "Word document"? Or are these set to something else?
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          I am using Zoom and phpBB on my coporate intranet and do not have an external URL.

          The extensions are setup correctly to their respective applications. Keep in mind that Zoom can index other attachments located elsewhere in the intranet's CMS.

          I emailed the .zcfg and log files.

          Comment

          Working...
          X