PDF In-Depth

Navigating the Internal Structure of a PDF Document

Sample PDF File



The following is a very minimal, but real, PDF file. It's been shortened for display purposes by removing a large piece of the XMP metadata stream. All COS object types are represented here, but there is only one stream object, the XMP metadata. Can you find it?

The cross reference table (the 'xref' list at the bottom of the file) has entries for 23 objects, 0 through 22. However the file only contains 7 objects. Table entries that are used to reference existing objects are marked with 'n' and unused, or free, entries are marked with an 'f'. This list gives the byte offset to a particular object from the top of the file. For example, the location of Object #1 (1 0 obj) is given by the line: '0000000016 00000 n', or 16 bytes from the beginning of the file. This is the first object in the file, it is the Pages Dictionary. It is not the first object in the PDF object hierarchy. That object is the Document Catalog, Object #5 (5 0 obj) at offset 169. Objects don't appear in any particular or useful order. If you insert or delete any characters in this file you will invalidate several of the offsets in the cross reference table, potentially making the file unreadable in Acrobat. It's a good idea never to edit a PDF file in a text editor.

%PDF-1.5 %âãÏÓ
1 0 obj<</Count 1/Kids[7 0 R]/Type/Pages>> endobj 3 0 obj<</ModDate(D:20040819171125-07'00')/CreationDate(D:20040819161955-07'00')/Title(MyFile)>> endobj 5 0 obj<</Pages 1 0 R/Type/Catalog/Metadata 21 0 R>> endobj 7 0 obj<</Type/Page/Parent 1 0 R/Rotate 0/MediaBox[0.0 0.0 612.0 792.0]/CropBox[0.0 0.0 612.0 792.0]/Resources<<>>>> endobj 9 0 obj null endobj 14 0 obj null endobj 21 0 obj<</Length 3193 /Type/Metadata/Subtype/XML>> stream <?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?><?adobe-xap-filters esc="CRLF"?> <x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-14, framework 1.6'><rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:iX='http://ns.adobe.com/iX/1.0/'> </rdf:RDF></x:xmpmeta><?xpacket end='w'?>endstream endobj
0 22
0000000002 65535 f
0000000016 00000 n
0000000004 00001 f
0000000066 00000 n
0000000006 00001 f
0000000169 00000 n
0000000008 00001 f
0000000229 00000 n
0000000010 00001 f
0000000353 00000 n
0000000011 00001 f
0000000012 00001 f
0000000013 00001 f
0000000015 00001 f
0000000373 00000 n
0000000016 00001 f
0000000017 00001 f
0000000018 00001 f
0000000019 00001 f
0000000020 00001 f
0000000000 00001 f
0000000394 00000 n
<</Size 22/Root 5 0 R/Info 3 0 R/ID[<9ea8983ff0d0394493c818f521bf950c><31ff1de0d464094cb2f6e18a012e3fbf>]>>

Getting tired yet? There are times when you need to look at the raw data in the file, but generally it's not very useful. The structure of a PDF File does not match the structure of the PDF Document it describes. For the most part the file looks like a list of unconnected object definitions. To get a sense of how it all fits together you need to look at it another way.

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.