PDF In-Depth

Getting text out of your PDF

April 24, 1999

Advertisement
Advertisement
 

Is there any way to select text off of a pdf file and copy it to a Word document? All I have is Acrobat Reader, and it won't let me select anything.

In many cases, you can copy the text from a PDF file and paste it. But there are a number of cases where you might not be able to. It isn't always easy to tell what is going on, so first I'll describe what happens when it works, then cover why it might not work.

Before you can copy text at all, you need to select it. The trick is, that Acrobat contains a number of different "tools". The default tool is the "hand" tool, which is just used for scrolling. You need to use the text selection tool.

In Acrobat Reader 3, this is the letters "abc" surrounded by a dotted line, while in Reader 4, this is the letter "T" with a dotted box next to it.

TIP: in Reader 4, some of the tools fold out. Just click the mouse and hold it on the text selection tool, and you will see more, related, tools, appear. This is important - without it you'll miss some of them. If a tool has a tiny triangle in the bottom right of its icon, then it will fold out. This was new for Reader 4. You can also use Edit > Select all to save having to select text individually.

Acrobat Reader 4.0

You drag the mouse to select the text, and then copy it (Edit > Copy, from the menus, or Ctrl+C in Windows, or Commmand+C on the Macintosh). What could go wrong? Actually, quite a few things. So, watch out for these problems.

It's different in a browser.

In a browser you can still use the text select tool, but the usual ways of copying don't work. That's because the browser sees the copy instruction, but doesn't bother to mention it to Acrobat! Luckily, Adobe thought of this and added a special COPY button to the Reader toolbar. It appears only when viewed in a browser. The button shows two tiny pages, side by side. Select all is not available when viewing a PDF in a browser.

It isn't text at all.

You can see it - there on the page - text. What else could it be? Actually, it could be a picture. The text could be a series of shapes which look like letters, or a scanned page, so the text is actually a bitmap. In these cases, trying to select text will seem not to find it. Unfortunately, there's not much you can do.

It copies, but it's complete junk.

Some ways of making a PDF file will give you hopelessly jumbled fonts. Think of it this way: most fonts have all of the letters a, b, c and so forth. You can put them in a grid; for most fonts, the letters will always appear in the same place. But some fonts have the letters all over the place. Acrobat has no way to know that this has been done, so it just copies the letters that you'd get for a normal font. This often happens when creating a PDF document in Windows, using TrueType fonts, and Acrobat Distiller. Before Distiller sees the fonts, they are already jumbled up. Sometimes just a few characters may be junk - these might be in a different font, or use a special character not available to other programs. Again, there isn't much you can do.

I can't even select text.

Sometimes, the text select tool is greyed out and can't be used. This happens with "secure" PDF files. The creator of any PDF can protect it - choosing whether or not to allow copying (and printing). If a document is protected, you would have to contact the copyright holder and ask for an unprotected copy to use. They might agree, or might want a fee. Many people forget that almost everything on the web is copyright, whether or not it is secure, and whether or not it has a copyright notice.

The text is in columns.

Acrobat doesn't understand about columns, so trying to copy text from columns can be painful - it just reads right across the page. But there is an easy work-around. Just hold the Ctrl key (Windows) or Alt/Option key (Macintosh) when selecting the text, and you will find you can drag around any rectangular area. In Acrobat 4, there's even a special tool, but remember to fold it out from the regular text selection tool.

But it's dozens of pages!

Acrobat Reader can copy only one page at a time. Acrobat Exchange (the commercial program) on Windows ONLY can copy the whole file, so long as you have enough memory (and patience). But if you need to do this, perhaps you should reconsider; it's almost always better to go back to the original file, if you can.

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Back to the past, 15 years ago! Open Publish 2002

Looking back to 2002, it's amazing how much of the prediction became a reality. Take a read and see what you think!

September 14, 2017
Platinum Sponsor





Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.

Features

Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.