Previous | Next | (P-PDF) PDF Accessibility
Topic: OCR compatibility with tag tree/searching
Conf: (P-PDF) PDF Accessibility, Msg: 134325
From: amdickson
Date: 6/11/2005 12:46 AM
Hello.
I'm trying to create accessible, searchable PDFs from paper texts.
I scan them as 300 dpi bitonal TIFFs and then perform OCR using OmniPage Pro 11. Then I save back as PDF with image on text. Retaining the image is important for these archival documents.
The first problem I'm having is with the tag tree. In Acrobat 6.0 Professional, I use the Add Tags to Document option in the Accessibility portion of the Advanced menu. This invariably creates a Figure tag for the page image and a Table tag for the page of text. Each line (or portion there of) of text is given a Paragraph tag. Also, the graphic zones that I've created in OmniPage are not recognized as Figures in Acrobat. This, to me, makes the tag tree relatively useless. But, the Web Accessibility people at my institution teach "an accessible PDF is a tagged PDF."
Do you have any experience with this or know of any solution? Do other OCR programs work better with Acrobat?
Also, some of my documents are math papers with formulas. The powers that be want the formulas to be expressed in natural language for screen readers as well as searchable by LaTex. Any suggestions on where to hide all of this text? My current (but untested) solution is to put the natural language in the OCR layer and the LaTex in Bookmarks, although this seems less than ideal, and I'm not sure how it will work across platforms.
Thanks in advance for your ideas.