Planet PDF Forum Archive

Planet PDF ForumWowsers! This is page is old, head to the LIVE Planet PDF Forum. It features more than 10 conferences, covering everything from beginner to in-depth developer and pre-press discussions. If you wish to continue... one & two archive covers 1999-2011 (160,000 pages).

New Forum | Previous | Next | (P-PDF) Developers

Topic: why deleting objects does not clear them
Conf: (P-PDF) Developers, Msg: 59906
From: DanielAri
Date: 5/29/2002 05:42 PM

Pete, you seem to have discovered the bottom line - that closing and
re-creating the doc is necessary. I will take a stab at _why_ this is
necessary for those who are curious.

Here is my guess as to what Acrobat is doing. I do not work for Adobe and
have seen no code for any internal part of Acrobat, but I know the PDF file
format pretty well; so here is a guess.

When you create a new file, Acrobat creates some structures, including a
table that lists the location of all indirect objects in that PDF file -
much like the xref table in a PDF file, but with additional fields that
indicate that the object is in the file, in a temp file, or in
memory. Don't be fooled; even a pdf file with no pages still has a few
objects such as the root dictionary and a few others that must exist. So I
bet that Acrobat creates these objects as well.

This is why creating a new document takes a little more time than calling
malloc once for a memory structure.

Now, you copy your pages from the other doc. This creates several indirect
objects for the page, links, bookmarks, fonts (hopefully not making
multiple copies if they are not already there), and more. Acrobat can find
every one of these indirect objects because they are referenced in that table.

Here is where it gets interesting and slightly more conjectural. When you
delete those pages, Acrobat surely removes references to them from the
pages tree. However even though the indirect objects for the page (page
dict and contents stream and more) are no longer referenced by the pages
tree, they may still be listed in the table that lists all indirect
objects. Thus, when the file is written, Acrobat steps through the table
and all indirect objects get written to disk. Acrobat does not realize
that some of those objects are not referenced anywhere in the document
structure (other than the xref table), leaving you with bloat. Garbage
collecting probably gets rid of some of this.

If you are curious, you could make two files. One file with fresh pages in
it. Another file with some other pages that are inserted and deleted
before inserting the same pages that are in the first file. Read through
both files and see what objects exist in the second file that do not exist
in the first.

Best Regards,
Dan-Ari Feinberg

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link...

Download free demo

Debenu PDF Tools Pro

It's simple to use and will let you preview and edit PDF files, it's a Windows application that makes...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.