PDF In-Depth

PDF Anatomy 101

October 31, 1999


Ok class, today we are going to dissect a PDF document and explore what it's made of.

For an in depth look at the internal workings of PDF check out the PDF Reference Manual.

The basic chemistry

A PDF document is made up of three types of objects, or classes of objects. They are:

  • Document objects.
  • Page objects.
  • Content objects.

Cutting things open. Scalpel please!

A document object must contain at least a cross reference table (xref) and one page object. Optionally it can also contain such things as named destinations, document info, hidden templates, thumbnails, bookmarks and more.

A page object usually contains at least one content object (there can be more than one), page information for cropping, logical page numbers and page rotation, links, article threads, annotations (including sound and file annotations), form fields, digital signatures, page actions, etc.

A content object can only contain the marking operators or resources (e.g. fonts, images) that make up the background of the page.

Putting things under the microscope

This object based construction allows you a certain type of functionality. Think of it as containers that all fit inside one another in a particular order. A content object is contained within a page object, a page object (and all the content objects it contains) is contained within a document object. A document object also has information of it's own apart from page and content objects. There are many advantages to this approach.

For example, when you replace a PDF page, you are removing all the content objects contained within that page object. There are also some other things which apply only to that page, such as cropping and rotation info, that are removed.

You are also making modifications to some of the information within the document object, namely the cross reference table (xref) which is essentially a map of where all things in a PDF file are located, including the replacement page.

However, the rest of the page object remains. This allows things such as form fields, and article threads (as well as other enhancements that can be so much work to create) to remain and be used by the replacement page. This can be a real time saver.

Now lets compare this with deleting a PDF page. Deleting a page actually deletes the page object. This means that everything the page object contains is also deleted. Thus, article threads, form fields and more are removed and can no longer be used by another page, and bookmarks for that page become non-functional. Depending on your situation, deleting a page and all that comes with it can also be a big time saver.

Classification and documentation

Now that we've discovered the mysteries of PDF construction it's time to bring it all together in an orderly fashion and to make a permanent record of it. Guess what? It's your lucky day since I've already taken care of this final step for you. Use the chart that I've designed. You can even print it any size you want.

Anatomy of a PDF Object (25K)
Anatomy of a PDF Object

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.