Planet PDF Forum Archive

Planet PDF ForumWowsers! This is page is old, head to the LIVE Planet PDF Forum. It features more than 10 conferences, covering everything from beginner to in-depth developer and pre-press discussions. If you wish to continue... one & two archive covers 1999-2011 (160,000 pages).

New Forum | Previous | Next | (P-PDF) Developers

Topic: Reading the content of the PDF documents
Conf: (P-PDF) Developers, Msg: 32882
From: senavaraiyan
Date: 1/16/2002 08:59 PM

Hai Friends

I am working for a task of Reading the content of the PDF documents and creating a txt file and trying to reproduce the same after some internal operations with the txt file.

In that process, i found that the content font informations were being extracted and
displayed before showing the text with ( )Tj operator.
The text was being shown with some Text Showing Operators like Tf,Td,Tc,Tw and Tm
I want to know more informations like how the Text Matrix (Tm) works.
Also where the size of the font is being showed.

With the reference from one of my friend , i got some info and able to get the size of the fonts a,d operators in Tm.(a b c d e f)Tm

As per this from Tm(text Matrix) some places the size is what we were specifying exactly same like

/F1 1 Tf
12 0 0 12 208.86 600.78 Tm
0 Tc
0 Tw
(Sample Text)Tj

Here i used a font size of 12 points.

But in some places the size is sometimes lesser or greater than what we were specifying like

/TT2 1 Tf
10.02 0 0 10.02 288.06 600.78 Tm
( )Tj
-19.7665 -1.5269 TD
-0.0016 Tc
0.0031 Tw
[(Tex)-4.6(t)0.8( in cou)-4.6(r)-3.9(ier New )6(1)-4.6(4)1.4( po)-4.6(in)-4.6(t )]TJ

Here the size i used was 10 points but it is showing 10.02

/F1 1 Tf
13.98 0 0 13.98 208.86 585.48 Tm
0.0009 Tc
0 Tw
(Sample Text )Tj

Here i uses a size of 14 points but it is showing 13.98

Can any one please tell me why this is happening?

Also in this the last two parameters e,f are always float.
From this how we can depict where the text is going to be shown?
Is these are the direct X,Y coordinates or Does we need to work on these to get the coordinates?

2. While reading the content of the PDF documents, they are being compressed by some standard Algorithm and one of those was FlateDecode.I am able to decompress the content of the stream like
[(Tex)-4.6(t)0.8( in cou)-4.6(r)-3.9(ier New )6(1)-4.6(4)1.4( po)-4.6(in)-4.6(t )]TJ

I want to know what is the siginificance of the numbers associated with the contennts,and in what fashion the whole text being splitted in to smaller part like

The actual text is:
Text in CourierNew 14 point is being splitted as
[(Tex)-4.6(t)0.8( in cou)-4.6(r)-3.9(ier New )6(1)-4.6(4)1.4( po)-4.6(in)-4.6(t )]TJ

If any body could figure out these things and explain me that will help me a lot.

Thanks in Advance


PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link...

Download free demo

Debenu PDF Tools Pro

It's simple to use and will let you preview and edit PDF files, it's a Windows application that makes...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.