Planet PDF ForumPlanet PDF Classic

Enterprise & Government
Find Software and ServicesPDF EventsFreeNewslettersAboutSite MapOur SponsorsAdvertise
PDF In-Depth

OCR, PDFs, and bates-numbered documents

Ernest Svenson Editor, PDFforLawyers.com

June 01, 2006

Advertisement
Advertisement
 

Optical Character Recognition (or 'OCR') is a great tool. As most of you know, when you have a scanned file it's basically just an image. Even though the image may be a document that contains words the computer regards those words as pixels that it displays. A word-processing file, by contrast, is an assemblage of characters that the computer can recognize as such, which is why you can word search a text-based document but not an scanned image. Unless, you OCR the image file.

When you tell the computer to do OCR you are asking it to do something very sophisticated. The computer has to analyze each assemblage of pixels to determine what character that assemblage might be. The cleaner the pixels the better chance the computer will guess right when it decides what character it is.

Adobe Acrobat has long had an OCR function, but in prior versions it was called "paper capture." Acrobat 6.0 was the first version that, to my mind, handled OCR reliably. Acrobat 7.0 does an even better job, although it introduces other quirks in other areas that I'm not crazy about. In any event, the OCR/Paper Capture function is a great tool in Acrobat because it keeps the image file intact but identifies the characters in the image file so that you can search across a document set for key words. Obviously, this is a nice tool for litigators who deal with document productions. And, so even though it takes a fair amount of time to OCR a document (approximately 15 seconds per page, more or less), it's often worthwhile. Which brings me to reader mail.

Today I got a great question from a reader about a problem he had when he ran the OCR function in Acrobat 6.0:

One question regarding the OCR function -- have you come across the problem where part of a scanned page had "renderable text" but the remainder does not? Apparently Acrobat 6.0 decides that it cannot OCR the remainder of the page, a dialog box appears acknowledging the problem, and you either cancel the OCR or move on to the next page.

This seems to have happened to me in one production where the Bates ranges are text, but the rest of the page is scanned. I'd assume this is because the documents were scanned and then some program like Easy Bates or something similar applied a Bates range to the PDF.

Any thoughts?

I have indeed had problems OCRing documents once I did something to them (like bates-stamping the documents electronically). In other words, OCR works best if done right after you've scanned them. And of course, as I said before, it works best on clean copies (i.e. fax copies are usually not going to give you good results). So, if you intend to OCR your PDF documents then it's best to do that first, and then apply the bates-stamp. Of course, if any of you readers out there have other observations please share them in the comments section. Thanks.

This piece originally appeared on PDFforLawyers.com, and has been reproduced with permission.

Related Products at PDF Store

Nitro PDF Professional OCR

Nitro PDF Professional OCR gives you all the features of Nitro Pro, plus the added power of Optical ... View full product details
Download free demo

Nitro PDF Professional

Nitro PDF Professional, your PDF creation and editing product. Priced at $99.99, Nitro PDF Pro is th... View full product details
Download free demo

ARTS PDF Aerialist

Take Acrobat to the next level with advanced splitting and merging; flexible bookmark creation and m... View full product details
Download free demo

Advertisement
PDF In-Depth Free Product Trials Ubiquitous PDF
  • Section 508 and PDF: The facts
  • Is PDF an open standard?
  • No, PDF is NOT owned by Adobe!
  • How to (successfully) switch to a paperless practice
  • What's your preference, sir?

Nitro PDF Professional

the perfect PDF product for business and enterprise, combining an extremely competitive price with a...

Download free demo

XpdfViewer

This ActiveX control (OCX) provides a PDF file viewer component, enabling developers to add PDF viewing...

Download free demo

Ubiquitous PDF: Planning for unexpected cash

It's the end of the financial year and some lucky souls are expecting a tax return. Whether or not the dollars are stacking up for you, it's worth keeping in mind this new PDF tool from Squawkfox.

July 29, 2010
Advertisement
Search Planet PDF
more searching options...
Advertisement







Create PDF Free

Advertisement
Advertisement
Most Popluar Articles
  1. Section 508 and PDF: The facts
  2. Word doesn't do Section 508, PDF gets the blame
  3. Is PDF an open standard?
  4. No, PDF is NOT owned by Adobe!
  5. The Tablet: What it means for publishing
Planet PDF Newsletter

Features

Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.

Featured Product

Docmetrics

Generate more, higher-quality sales leads from your PDF marketing content. Docmetrics is a web-based system that lets you capture previously unavailable reader data. Free trial.

Platinum Sponsor
Create & Edit PDF - Nitro PDF Software

ARTS PDF

Silver Sponsors

PDF-Tools QuickPDF: The Unrivaled PDF Developer Toolkit

Advertisement

Advertise | Submit news | Newsletters | About | Contact | Site Map | Forum Archive
  Planet PDF RSS Feeds | Buy PDF Software | Find PDF Software | Free PDF eBooks

Powered by CM3To connect with us: Read Nitro's PDF Blog, follow nitro pdf on Twitter, or join the Nitro PDF LinkedIn group. Planet PDF, PDF Store, Nitro PDF Software and ARTS PDF are all copyright © 2009 Nitro PDF, Inc. and Nitro PDF Pty Ltd. All Rights Reserved. Privacy

Planet PDF