PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing PDFs and Word Docs - Advice

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing PDFs and Word Docs - Advice

    HI - I am using 7.0 build 1010.

    At www.chenetwork.org/dvd
    user: CHEDVD
    pass: Access2015

    We have nearly 7gb of pdfs and some Word docs which I've indexed. Our problem is that there are multiple files, this repetitive search results for the same file. Looking for solutions to reduce this and suggestions for best practices when indexing such a large body of pdfs and docs?

    Thank you!

  • #2
    I searched for the word "test" and I see that there are duplicate URLs. This is unusual, it shouldn't happen normally.

    Can you tell us if:
    a) You have modified "search.php" or "settings.php" or any of the .zdat index files generated by Zoom.
    b) You are mixing files from different indexing sessions. Note that "search.php" and "settings.php" are essential files that are part of an index, and cannot be mixed from one session with another's.
    c) Are you using Offline Mode or Spider mode? Can you send us a copy of your .zcfg configuration file with your indexing configuration and we can take a closer look.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Complete stab in the dark here but I'm assuming you probably have several hundred files at least. I had a issue drawn to my attention the other day on our Intranet. There appeared to be several instances of the same file. Visually the file references looked identical which was odd because the Intranet is configured to treat files with the same name as a new version of an existing file. On very close examination some file names had an additional space or two in them. With the proportionality of fonts these days, to the naked eye it was almost impossible to pick up.

      Comment


      • #4
        Ray I would be happy to send you my config files. To where should I send them. I have two versions of this search, one for a CD, thus on my local drive in a different folder that gets burnt to the CD, and one for my web site, which is on another drive and uploaded to our web server.

        Originally posted by Ray View Post
        I searched for the word "test" and I see that there are duplicate URLs. This is unusual, it shouldn't happen normally.

        Can you tell us if:
        a) You have modified "search.php" or "settings.php" or any of the .zdat index files generated by Zoom.
        b) You are mixing files from different indexing sessions. Note that "search.php" and "settings.php" are essential files that are part of an index, and cannot be mixed from one session with another's.
        c) Are you using Offline Mode or Spider mode? Can you send us a copy of your .zcfg configuration file with your indexing configuration and we can take a closer look.

        Comment


        • #5
          You can send your contact files to zoom [at] wrensoft (dot) com.

          Please reference this thread in your email.

          Is the problem occurring for both your CD search and your web site search? Are you using spider mode or offline mode for your web site search?
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X