PDF In-Depth

Online PDFs - What You Get Might Not Be What You See

April 08, 2000

Advertisement
Advertisement
 

When converting documents to PDF, we expect all attributes in the original documents to be preserved. This is indeed the case, with a few exceptions -- some of which vary depending on the specific Acrobat versions, authoring applications and printer drivers used.

When the PDF file is intended for print purposes and not for viewing on the screen and interactivity, we don't have to be concerned with any display or other problems (assuming of course that the print results and quality have been tested and verified).

However, if the PDFs are to be used online, we must keep several additional factors in mind: search/find capabilities, copy/paste results, display quality and text editability (if desired).

You may find that PostScript drivers and Acrobat Implementations which are fine for print production might not be good enough for online PDFs.

In this article, I will briefly address a few of the problems you may encounter, without discussing additional application-specific aspects. One of my next articles will focus on PDF display quality issues.

Find, Search, Copy and Paste

You have probably heard of, if not encountered in your own files, search, copy and paste problems related to using TrueType fonts in Acrobat. Fortunately, most of these problems have been solved in Acrobat 4, IF the latest AdobePS drivers are used to produce the PS file to be distilled (4.3.1 for Windows 95/98 and 5.1.2 on Windows NT) AND only if Acrobat 4 Reader is used to view the PDF.

Another font-related problem is that some Type 3 fonts might not be searchable; and in some rare cases, even Type 1 fonts are not searchable (and text using them in the PDF might not be even selectable).

Even if you use "perfect" fonts, you might still experience problems with the Find and Search functions as a result of the way the text is stored internally in the PDF. In contrast with your source documents, the PDF can be regarded as a final-format print file. It has no paragraphs and no flow, and is stored as isolated lines of text. Thus, for example, if the phrase you are searching for is split between lines in a page with multiple columns, or split between pages, it will not be located. The same goes for hyphenated words split between lines.

Even Acrobat's treatment of words has its own logic, which has nothing to do with grammar. It treats space and punctuation symbols as separators between words, but it also inspects the physical distance between letters and uses the results to "decide" when a new word begins. Although in most cases the outcome will be as expected, there might be surprises. A common instance of this is evident when letter-spacing is used.

The following example is the PostScript description of sample text, demonstrating this letter-spacing this issue. Copy the following lines to a text file, save it as "test.ps" (without creating a PS file) and distill it.

% start

/Helvetica findfont 11 scalefont setfont

48 640 moveto
(Electronic Documentation: 2000) show

48 512 moveto
2 0 (Electronic Documentation: 2000) ashow

showpage

% end

When you inspect the resulting the PDF, you will notice that in the second text line, where letter-spacing is applied, there is a single space between each of the letters (select the text and paste it to a text editor to see this interesting "effect"). This will also be noticeable in Acrobat when you select the text by dragging. This is despite the fact that the source text you distilled only has a single space between words, not between letters.

In the above PDF,a single space is inserted between each of the letters. In other cases, for example in the output from applications such as Word and FrameMaker, the spaces added will be inconsistent (depending on factors such as kerning and the specific font in use), so that "2000" might appear as "2 00 0" or "2 000".

When viewing or printing the PDF, this will not be noticeable. However, when using Find/Search, the item you are trying to locate will not be found (unless of course you search for it using the same irregular spaces between letters...).

Depending on the specific driver used, this may also cause the wrong words to be highlighted when you use the search function.

If you use letter-spacing in your source files, check and see what effect it has on your PDFs. I have encountered cases where even though the letter-spacing was applied only to the running header, it rendered the Search function useless for all of the text.

Fake Bold

Several applications let you specify a bold font weight even if you don't have the actual bold font installed. In Microsoft Word, for example, you can select Arial Black and also activate the Bold property. Arial Black Bold? There is no such font. Word in fact creates a "synthetic bold" property for that text. But what happens when you convert the document to a PDF?

With some Microsoft-based PostScript drivers, the synthetic bold property will be ignored altogether, and you will have regular text.

If you use the AdobePS driver, the "bolded" attribute is implemented as multiple instances (usually four) of the same text, repeated with very small offsets.

The bolded text in the PDF will have a strange display effect. If you inspect the screen display using large magnifications, you will probably be able to see multiple layers of the same text. And even if you cannot see the effect, try to edit such text in the PDF file, and you'll have to work your way through editing different layers, which is not practical to say the least.

Copying and pasting an entire line of "bolded" text will result in each word pasted a number of times.

Driver-Level Font Subsetting and Missing Characters

Font subsetting and embedding are often used when distilling PDFs: font embedding for PDFs with text that preserves its look and feel, and subsetting to keep the PDF file size down. This combination makes sense in almost all cases... unless you need the text to be editable (for example, if you want to have the option of calling your print producer asking them to change the release date on the cover). Text which is subset can not be edited, since Acrobat does not handle fonts for which it has only partial letter descriptions. So if you need the text to be editable (to the limited level which is possible in Acrobat), subsetting has to be turned off.

And note that there are cases where subsetting is turned off in Distiller's job options, but fonts are nevertheless subset in the PDFs, due to driver-level subsetting.

Driver-level font subsetting takes place with some PostScript drivers. To reduce the amount of data sent to the printer, character descriptions are sent only for the those characters actually used. Distilling PS files produced by these drivers (including AdobePS 4.2 - 4.3.1) will therefore result in PDFs where the fonts are subsetted even though subsetting is not activated in the Distiller job options. Again, this is usually not of any special consequence, except that the subset text is not editable in the PDF, if text editability is required.

With PostScript fonts, driver-level subsetting can be disabled by choosing "Don't Send Fonts" (in AdobePS 4.x this is done by accessing the PostScript driver properties, and clicking the PostScript tab, Advanced). The PS file produced will not include the fonts, it will be Distiller's responsibility to access them and handle them according to the job options.

With TrueType fonts, driver-level font subsetting cannot be effectively disabled.

Note that the "Don't Send Fonts" option should not be used with laser printers, because the fonts will not be sent to the printer!

Another important implication of driver-level subsetting (partially resolved in version 4.05), Acrobat is not always aware of driver-level subsetting. If you use the Insert/Replace Pages function, taking pages from different PDFs (where the same font was subset differently due to different text content), the fonts are not merged. The result is that some characters are not included in the current font subset and therefore disappear in the resulting PDF, with no warning or error message.

When Acrobat 4.05 encounters different subsets for the same font, it does not merge the subsets of the different PDFs, but it at least displays a message saying that it cannot handle different subsets of the same font, and the insert/replace pages operation is not carried out.

If you use earlier versions of Acrobat and employ the Insert/Replace Page function, inspect the results very carefully.

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Back to the past, 15 years ago! Open Publish 2002

Looking back to 2002, it's amazing how much of the prediction became a reality. Take a read and see what you think!

September 14, 2017
Platinum Sponsor





Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.

Features

Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.