PDF Text Under Image Format
Zoom v5 worked fine on all .PDFs saved as TEXT AND IMAGE (TAI).
we purchased v6 (professional) but PDFs saved as TEXT UNDER IMAGE (TUI) are not returning results although the original TEXT AND IMAGE PDFs are returning successful search results on the website after reloading the newly-indexed .zdat files.
Additionally the Log shows that all PDFs (TAI and TUI) are being indexed but that three TUI files could not be opened [Log reads Skipping C:\Documents and Settings\Seth\My Documents\Website\Mosaic\pdf\m17_01_86_04.pdf (Could not open or read from file)].
Furthermore, the Log does show that the TUI PDFs are being indexed but, again, several attempts that should have found the files did not, in fact, return the files.
Anyone familiar with this issue re the saving of PDFs?
We would prefer using the TUI PDFs as they resolve some image resolution issues.
seth j hersh
We don't exactly know what you mean by text under image. I am guessing that you might have a text layer geberated using OCR? What tool are you using to make te PDFs?
For the files that had the error, "Could not open or read from file", did you have these files open? Was ther any other process on the machine that might be using these files while indexing is taking place. Does the error happen all the time on the same 3 files. If yes, can you E-mail us one of these files.
we're using Abbey FineReader v8 to handle the OCRing and PDF creation.
I must've had the three PDFs open while i was scanning since I am no longer generating the "Skipping file ..." error. My mistake -- and thanks for alerting me to the obvious problem.
We did locate this other thread on your forum, http://www.wrensoft.com/forum/showthread.php?t=3122, which suggests we shouldn't have any probs...
My Scan Options configuration shows that the .pdf extension is associated with an Acrobat Document. Additionally, I have indexed all the files under Scan Options | Scan Extensions | Configure under all of the three Scan Methods. Somehow I felt the third option, Scan Text by text layer, would fix the issue but it did not.
to conclude, our Text Under Image mode for creating the PDFs is not allowing (or seems to be not allowing) Zoom to index the available text layer whereas our first PDFs (Text And Image) remain indexed and successfully searchable.
Examples of the PDFs which are not being "found" by Zoom can be found at http://www.mosaicsciencemagazine.org/index.php?mode=article&pk_magazine=64. Click the PDF link to open it.
Examples of the PDFs which are being found by Zoom are at http://www.mosaicsciencemagazine.org/index.php?mode=article&pk_magazine=47.
Thanks very much for the prompt feedback, David.
Stop the Presses
hmmm...i may be the prob here.
i just realized that the /zoom indexes i'm getting from v6 may not be going into that same /zoom directory on the server. i found a set of .zdat files in the public_html folder so i may not be uploading to the correct directory -- so my old indexes are still active.
when I did move them to the public_html directory, i'm getting adifferent response on searching.
pls do not do anything further until i sort things out here. I apologize for the misunderstanding.
I apologize for leading everyone astray, including myself as the Pied Piper.
The issue has been resolved: i was reading the wrong set of .zdat files, having located them in a different directory and forget this salient fact after not working on the site for several months.
Zoom was/is working "as advertised" -- and I was working like an idiot. Zoom does a great job at simplifying a complex task.