The wordprocessor to PDF blues
By Bryan Guignard
No one can deny the universal use and acceptance of this long established workhorse, the wordprocessor. In recent years, the Portable Document Format has provided nearly all computer users a medium that allows anyone to break free of the confines of proprietary file formats and operating systems.
This has been a blessing for many desktop publishers. However, there is an increasing demand for PDF from traditional wordprocessor operators. This is fuelled to a great extent by the move of business to the www, and the need to convert countless business documents to PDF.
Despite its popularity, universality, and ease of use, PDF still finds itself at odds with wordprocessors much more often than it should, or needs to. Although PDF paints a pretty rosy picture for the DTPers, and the high end graphics world, the same cannot be said of PDF's relationship with wordprocessors.
I want to mention at this point that the problems and limitations I'm about to present are not just wordprocessor specific, but actually extend to any non-postscript oriented computer programs. I will also only be covering the Microsoft ® Windows ® and Apple ® Macintosh ® platforms. Other platforms can play by a very different set of rules that may or may not have parallels with what I'll be covering.
To start, let's take a look at the fundamental problems. I'll then cover some of the less fundamental issues - mainly design limitations - and some things that can help improve your chances of success with your Wordprocessor to PDF strategy.
The most fundamental, and difficult problem to deal with, has to do with the basic differences and incompabilities between the Adobe Postscript imaging model, and the Windows and Macintosh imaging models. The Windows imaging model is called GDI (graphic device interface), and the Macintosh model is called QuickDraw.
It's important to realize that computer hardware and software come with built-in limitations, and the sooner you learn to design your workflows to accomodate these limitations, the sooner you'll be able to produce professional PDF documents from your wordprocessor files. Postscript oriented applications, and Distiller can easily bypass QuickDraw and GDI when creating PDF documents, and are mostly exempt of the problems I'm covering here.
The ideal would be to have your wordprocessor write PDF directly, and hopefully this will be a possibility before too long. But as the situation presently stands, most, if not all, wordprocessors must use an indirect method to produce PDF. Although there are many tools available, such as Ghostscript, and a variety of scripts and standalone converters, (and no doubt more are being developped as I write this), I will only be covering the tools and methods available with Adobe Acrobat.
QuickDraw and GDI have limitations, and any applications that use QuickDraw or GDI as part of the PDF creation process (mainly non-postscript business applications) will be subjected to these limitations. As well, some things differ in fundamental ways from the Postscript imaging model. These can cause some pretty serious problems in your wordprocessor to PDF workflow.
PDFWriter acts as a printer driver which converts operating system graphics and text commands produced by QuickDraw and GDI, into PDF code, and then assembles it into a PDF file. Because of it's dependence on QuickDraw and GDI, PDFWriter is also subject to their limitations. In other words, if QuickDraw and GDI introduce limitations or errors, then all PDFwriter can do is pass along these errors to PDF. GIGO is the term sometimes used in the computer industry, it means Garbage In Garbage Out.
Here is an example of the workflow I'm referring to.
Macintosh application > QuickDraw > PDFWriter > PDF > Acrobat.
Windows application > GDI > PDFWriter > PDF > Acrobat.
A typical scenario
Here is one common problem that occurs as a result of introducing QuickDraw or GDI into your workflow. Wordprocessors generally do not understand Bezier curves. You import a clipart that contains Bezier curves into a Word document. Depending on how you do this, the Bezier curves (usually as EPS files) may get embedded in the document "as is" initially, but will get converted to polygons when it encounters QuickDraw or GDI on it's way to PDFWriter. (That is why Adobe has always recommended using Distiller if your files contain EPS graphics, since this workflow will bypass QuickDraw and GDI completely, and leave your Bezier curves intact).
Or, the Bezier curves may get converted to polygons right away (if it's brought in throught the clipboard, or converted to a graphics format your wordprocessor can edit). Either way, your beautiful Bezier curves are no more, and your graphic may or may not get mangled depending on it's complexity - more often than not it will end up looking very jagged, because of the imperfect translation of Bezier curves to the straight lines of polygons.
Some people complain that when they print their wordprocessor documents to a Postscript printer that all is beautiful, but when they print to PDFWriter, or even Distiller in some cases, that things go wrong. That's because a Postscript printer is not PDF, they are different in some fundamental ways. A Postscript printer contains what's called a RIP (Raster Image Processor).
It converts anything it receives into a bitmap image that it then instructs the printer to apply to a piece of paper as "paint", or ink dots. Any Bezier curves, vector objects, and text, get rendered into a bitmap representation of the page. This is a completely different process from PDF, which uses the original font and vector calculations to produce pages. If you print a PDF file with a Postscript printer, it must still go through the RIP before it can be applied to paper.
The path from wordprocessor document to PDF document is a bit like adding apples and oranges. It's actually a lot more like translating from one language to another. For example, the English language has the word "it" to denote a genderless state (such as an object). This cannot be translated into a language like French, because French knows nothing of a genderless state.
In French everything has a gender. A boat is male, and a swimming pool is female for example. "It" simply does not translate into French, because French has no such concept, and this can cause a lot of miscommunication to occur. It's pretty similar to this when going from wordprocessors to PDF.
If wordprocessors were to become fluent in Postscript, which is the underlying imaging model for PDF, then they would simply become DTP applications. This brings me to the next thing I want to present you, design limitations.