Acrobat Capture 3.0x: Adobe Fires Back, Nearly Misses
Document conversion expert test drives long-awaited major upgrade
By Duff Johnson, Planet PDF Contributing Editor
Review Date: August 24, 2000
Adobe Acrobat Capture 3.0
From the start in 1994, Acrobat Capture 1.01 was a visionary product. In 1994, the earliest days of the commercial Internet, the decision to develop Acrobat Capture offered the promise of unifying paper and electronic source materials in a single, compact, standardized, multi-platform environment. This lofty aim helped propel and fulfill the PDF concept, making possible the modern reality: PDF as the globally accepted "full-fidelity" electronic document format.
The only document-imaging product in the Adobe lineup, Capture is focussed at the "production" volume imaging market. Unique among OCR software developers, Adobe recognized that fully automated processes and text-accuracy-only correction sub-systems would leave conversions to PDF/Normal files -- i.e., files made from original electronic sources -- hopelessly out of reach.
The original model for Capture was a faux client-server arrangement - a central processing core for automated functions and a workstation application for value-added (text accuracy, page layout and image enhancement) work. Capture 3.0x expands on this premise with beefed up automated processing and a true client-server arrangement for the value-added labor.
Gone is the simple "flowchart" dialog. Capture 3.0x sports a busy new interface (screengrab 1) with a Windows Explorer-esque workflow builder mated to a raft of processing information. There is a new Zone tool, simple and reasonably functional, and a superb new QuickFix text correction tool. Capture Reviewer got a major upgrade as well, although the changes are subtle compared to the overall application.
One impressive new feature in Capture 3.0x is true load-balanced distributed processing for the OCR and other automated engines. Add a processor to the Clusters, and watch throughput go up as other processors come on-line. It's stable too, with far fewer crashes than the 2.01 software.
Exception handling has been beefed up, but is still weak compared to full-fledged volume imaging systems such as Input Accel and Ascent Capture. Capture includes a scanner-control front-end, but while aimed at the volume scanning market, the software is still best used to process existing scanned, enhanced, rotated and quality-controlled images.
A moving target throughout the development process, OCR in the Capture 3.0 engine represented a substantial improvement over the aged engine in Capture 2.01 -- although we have yet to see this promise fulfilled in the beta 3.0x code. Tuning issues with the 3.01 engine aside, the new Capture should be substantially more accurate overall, particularly on clean documents and small, clear type.
The OCR issue we encounter more than any other, however, is a frequent refusal to capture individual text blocks, particularly those located near images. This problem resulted in a very serious flaw in the 3.0 release - large text areas going without OCR at all. Adding a template zone stage to the workflow helps, but does not fully correct the problem. We chose not to use the 3.0 code for Searchable Image production. An additional concern is the font recognition in the new engine, which we evaluate (so far) as worse than Capture 2.01 in terms of font size and spacing.
New Tools: The Zone Tool
Long-awaited, the new Zone tool (screengrab 2) does what every Zone tool does, and not much more, which is a pity with the PDF-directed Capture. After all, we're not just talking OCR here! We would like to see the ability to control the output of each image zone, as is offered with PageGenie, not to mention a powerful table-recognition tool, as in Scansoft's powerful TextBridge software. Additionally, the engine tends to crash on pages with very many zones - as always, your mileage may vary!
It frequently seems necessary to include a full-page "text zone" step in each workflow to force the OCR on bitmap regions that appear as images to the software. Missing this step can result in a hard-to-detect failure to OCR in Searchable Image pages - another problem Adobe says they will attempt to correct in the release version of Capture 3.01. We still get mixed results on selected pages, and I suspect that this type of failure to OCR will remain a significant annoyance, as it is with most OCR software.
New Tools: QuickFix
This new tool has a unique interface which brings sorting rudiments to the process of OCR error correction. The concept allows for some pretty sophisticated text-correction routines. QuickFix (screengrab 3) allows you to sort suspects, permitting a correction focus on terms to be added to a dictionary, or alphanumeric words of low confidence. It's primarily useful for Searchable Image text correction.
Improved Tools: Capture Reviewer 3.0
Already a reasonably mature tool in 2.01, (Reviewer 3.01 screengrab 4) is the last word (so far) in layout and appearance management for PDF/Formatted Text and Graphics conversions. A central irritation is that the screen paints are slower, retarding productivity.
Valuable new features include:
- The ability to set text styles, but they only work for the current document - style sets can't be saved.
- A new align objects tool, which is VERY useful.
- Efficiency enhancements in several tools. In order to insert a new line of text in 2.01, you had to insert a text box, then a text line, and then edit the text with the text tool. In 3.01, you simply draw a line with the text tool and add your text.
- Table lines are easier to draw with the improved rectangle and line tools, and the lines are handled consistently in the PDF output.
- You can now fill document information fields in the final PDF from within Reviewer.
- You can insert more image types as pictures than in 2.01.
- You can now add Web links and links that go to other pages in the same document. However, you can't edit link options. All links will be invisible, inverted, and go to full page view.
The Overall Experience
I saved this for last, because the reader should not be distracted from the many fantastic improvements to this ground-breaking 3rd generation software.
HOWEVER! Out of the box, we found Capture 3.0 virtually unusable. From botched OCR (a problem mostly cured with a forced Zone operation) to basic image quality in skew-adjusted images, we could not put Capture 3.0 in production at all. Our extensive testing of the forthcoming 3.01 code promises much better things to come.
A major advantage to Capture 3.0x is in file management. The workflow design will automatically move your files from one step to the next. This saves a lot of the administrative overhead for high-volume production compared with Capture 2.01.
One sour note: As of the last beta version, Capture 3.0x is unusable for color output. Images are highly compressed and, in PDF/Formatted Text & Graphics output, there is gray shading surrounding text and images. We certainly hope Adobe addresses this issue, because we won't be Capturing in color unless they do!
Are there alternatives to Capture for converting paper to PDF? In a word, yes. PageGenie is capable of fine Searchable Image PDF files, and can optionally use PDFWriter to access the Adobe Libraries, if required for the project. Most other alternatives offer far better OCR than Capture 2.01, and most are faster and more stable as well.
The key advantages of Adobe's new Capture product lie in its scalability, stability and the degree with which it will reward the sophisticated user. Just as important is the fact that Adobe's Capture remains the only post-OCR text/layout/images editor (Capture Reviewer) that makes true layout reproduction possible.
For high-volume processing or PDF/Formatted Text & Graphics conversions, you may come to appreciate this software.
Since the mid-1997 release of Acrobat Capture 2.0 (and the must-have 2.01 maintenance update), options for paper-to-PDF conversion have multiplied like mushrooms.
Some other currently available applications include:
- FineReader, by ABBYY Software House
- Awesome OCR from Russia, an advanced desktop OCR engine with PDF/Searchable
- PageGenie, by Paravision Imaging, Inc.
- Powerful and flexible, this station-based software, is a powerful competitor to Acrobat Capture
- Prime OCR, by Prime Recognition, Inc.
- Another pricey client-server model, with an extraordinarily accurate voting OCR system.
- TextBridge, by Scansoft, Inc.
- Scansoft has released a new version since this review. (I reviewed this software earlier for Planet PDF.)
- TypeReader, by ExperVision, Inc.
- A basic OCR engine with the ability to write out PDF files.
This list is not intended to be comprehensive.
Several full-scale imaging software solutions that include PDF creation are:
Resources from Document Solutions, Inc