Home » Forum
  • If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Announcement

Collapse
No announcement yet.

Accessing PDF document information fields in Zoom

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accessing PDF document information fields in Zoom

    Using: Zoom Search Engine, v. 7.1 (Build: 1000)

    Is it possible to search on the standard pdf document information fields (Title, Author, Subject, Keywords) as separate fields (e.g., not part of the full text search)? Our pdfs have the document information fields populated. We need to be able to do fielded searches for words, e.g., only in the Title.

    Thank you.

  • #2
    Yes, it is possible, see the following forum post, http://www.wrensoft.com/forum/showth...7029#post17029

    Also, see the following resources for additional information on "Custom Meta Fields".

    User Guide - http://www.wrensoft.com/zoom/usersguide.html
    FAQ - http://www.wrensoft.com/zoom/support...ta_fields.html

    Comment


    • #3
      Thank you for your reply.

      I added these fields to the Custom Meta Search Fields page: TITLE, KEYWORDS, AUTHOR, SUBJECT (the 4 pdf document info fields).

      On the What to Index page: I unchecked "Meta author", "Meta keywords", and "Title of page" (since these are available as custom meta fields).

      I get the search boxes for title, keywords, author, and subject. All appear to work okay except for title. Nothing I try matches there.

      Looking at the zoom_pageinfo.js file, the custom meta field contents appear to show up in the pageinfo section. I see the keywords, author, and subject content, but not the title content. Here is an excerpt:

      pageinfo = [[0,3707376,0,5,null,null,"TEST2015-8508","Brad P. Adams, Christina Barnes","Cornell University"],

      If I leave the "Title of page" checked on the "What to Index" page, I cannot search within just the title field.

      Do I need to reference the title field is some particular way to be able to search the content of just this field?

      Or, do I need to use .desc files repeating the pdf document information content in order to search on the title field and only the title?

      Comment


      • #4
        Title won't work like the other meta fields you mentioned. This is because the PDF processor/plugin (pdftotext) formats the title field as a <title>...</title> tag, and not as a meta field in the form of <meta name="title" ...>

        So yes, if you use a .desc file and format your title as a meta tag, it would work.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X