Planet PDF Forum Archive

Planet PDF ForumWowsers! This is page is old, head to the LIVE Planet PDF Forum. It features more than 10 conferences, covering everything from beginner to in-depth developer and pre-press discussions. If you wish to continue... one & two archive covers 1999-2011 (160,000 pages).

New Forum | Previous | Next | (P-PDF) General

Topic: Re: Strange character set when copying text to Word
Conf: (P-PDF) General, Msg: 50315
From: picax
Date: 5/29/2002 04:37 PM

En/Na Mark Zempel ha escrit:
> the first line should read ZERO %
> Any help will be greatly appreciated
> Mark Zempel
> ------------------------------------------------------------------------
> Name: textsample.pdf
> textsample.pdf Type: Portable Document Format (application/pdf)
> Encoding: base64

Hi Mark,

This is a curious PDF. This file has made created or manipulated with
PitStop application. This PDF have internally all the text totally and
highly altered. I will explain this complex structure and the reasons
for this strange character behaviour:

A. When you open the document with the Acrobat viewer, you will see all
characters with the same font style. Internally it's not the same. It
18 different font types (see the menu File > Document Info > Fonts)

B. Every font have an 'strange' and 'different' font encoding. In this
simple PDF there are 18 different font encodings (font encoding is the
correspondence with the 'character name' and this ascii code). For
example: in the particular font called 'FMNBBE+F19726528.0' the
character '9' have
the ascii code '1', when the standard ascii code for this is '57'

C. Also all the characters have their internal name altered. For
example: in the same font called 'FMNBBE+F19726528.0' the character '9'
the name 'c57', when the standard character name is 'nine'

D. By this reasons you will see well the text document only in the
Acrobat viewer, but when you 'copy and paste' the characters, you only
'transport' the 'rare and particular' characters codes, and the
disaster appears in the wordprocessor.

Have you tried to use a plug_in to extract the PDF text ? Do you know
if they extracts only strange characters ?

If you can't resolve the problem and you are interested on find
a tool to extract correctly the text, lets me know, and I can
develope it in a few days.

I hope this can help you.

Marc Antoni Malagarriga

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link...

Download free demo

Debenu PDF Tools Pro

It's simple to use and will let you preview and edit PDF files, it's a Windows application that makes...

Download free demo

Back to the past, 15 years ago! Open Publish 2002

Looking back to 2002, it's amazing how much of the prediction became a reality. Take a read and see what you think!

September 14, 2017
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.