PDA

View Full Version : Skip 1 page of all PDF files?


Ted
05-13-2008, 12:31 PM
Hi,
I've just started playing with Zoom and am finding it to be extremely good. I have one [probably simple] question that deserves a little history first.
We are a document imaging bureau offering an image hosting service to out clients (i.e. they can search for and download documents via the internet using index data that we have captured and hold in a database). Most of our images are produced as PDFs as they can be OCR'd and hence made searchable.

We are looking to offer a full text search faciilty which is where Zoom enters the room.
All is working well, but I would prefer it if the first page of each PDF file was not indexed as this is an index sheet that we attach to the front of every file and is not really related to the scanned document itself.

I'm aware of the ZOOMSTOP and ZOOMSTART tags which I could put at the start and end of the first page, but the OCR 'might' not read it correctly (we scan at 200dpi and not 300dpi due to speed and file size constraints) and could mess things up.
The PDFs are all between 10 and 400 pages long and there are potentially tens of 1000's of files.

So......is there a configuration that i haven't found yet that tells zoom to ignore the first page of every file?

wrensoft
05-13-2008, 02:01 PM
There is no option to ignore the 1st page of a PDF file.

If is was a HTML file you could insert tags to skip part of the document, but it is not possible to insert tags into a PDF file.

My advice would be not to OCR the 1st page and leave the 1st page as an image. If there is no text on the 1st page then Zoom will ignore that page.

Ted
05-13-2008, 03:13 PM
This would be the simplest option normally......but as we're dealing with 1000's of pages each day our OCR system is pretty much fully automated and cannot be configured to ignore any pages.

No worries, it's not a huge issue - and if it becomes one I can automate the removal of the first page from every PDF before the OCR takes place.

MergeThis
05-13-2008, 07:52 PM
It's ZOOMRESTART, not ZOOMSTART...just in case you need to use it for another issue at some point.


Good luck,
Leon

Ted
05-16-2008, 10:42 AM
It's ZOOMRESTART, not ZOOMSTART...just in case you need to use it for another issue at some point.


Good luck,
Leon

Hi Leon,
I was was just typing madly away.....knew it wasn't quite right ;)

Thanks for the pointer though.
Ted