PDF In-Depth

Understanding PDF Metadata

March 03, 2005


Several months back, we had some discussion here about an article in Law Technology that warned of the dangers of transmitting metadata in work documents -- whether they are Word or PDF files. Without rehashing the entire thing, I will weigh in with the observation (to paraphrase Dilbert) that as with nuclear power -- and stupidity -- metadata can be used for good or evil (and you don't want to get any on you).

Working Definition

Metadata is simply data about something, whether it is an object, person, or other data -- an extremely broad and flexible concept For example, a book can be said to contain within its covers both data (the text) and metadata (e.g., the front matter including ISBN, publication date, and all of the other stuff that only librarians understand). This metadata goes with the book, wherever it goes.

If you write notes -- either in a notebook or in the book's margins -- your notes could be considered metadata. When the librarian notes that the book has marks in it, that becomes part of the library's metadata about that particular book; so does the history of who checked the book out, for how long, etc. Amazon.com will have other metadata about the book (that is, the same title, not the same exact book) -- book reviews, links to other books and materials ("people who bought this book also bought..."), prices, shipping information -- a stream of metadata spews from Amazon as from a virtual firehose. Similarly, every vendor of the book that creates a record of the sale -- all the way down to the guy at the yard sale who notes that he sold it for 50 cents -- creates metadata, both about the title and about individual books. This metadata obviously doesn't become part of the book -- it lives in someone's system. (Metadata can become part of the data -- like the links to this blog become part of the blog itself. That's less likely to happen in the physical world.)

Metadata is useful in a particular context

Metadata is useful to the people and systems that keep it, primarily within the context of a particular system. Electronic networks make it possible to create, store, and manipulate huge amounts of metadata. For those of us old enough to recall checking out library books by writing our name on an index card kept in a pocket inside the back cover, consider the amount and quality of metadata pertaining to a single copy of a library book compared to 25 years ago.

This ability to create and store vast amounts of metadata is no accident -- you can only keep track of virtual stuff by collecting data about it. We just need to be aware of what it is, what it says, and how to control its dissemination.

There has been much written about the dangers of metadata in law firm documents, and in particular of the potential hazards of the metadata in Microsoft Word documents. The primary reason that the metadata recorded by MS Word can become a problem is because Word records most of its metadata invisibly. We need to be aware that it exists, and to make the invisible visible.

Word was designed to generate all this metadata in order to make it possible for users to create a document, revert to earlier versions, collaborate, and merge the work of several people into a single file. These are all useful things, and users don't want to be bothered with doing and tracking all of these functions. After all, that's what computers are for. In the context of a business document, the metadata functions of Word make perfect sense.

Even in the context of a legal practice, these metadata functions make sense -- after all, lawyers need handy writing tools as much as anyone. The metadata in your work product makes it possible to automate your practice, to file and find your own work product according to subject, author, or date. You can add lots of useful metadata to aPDF or other document manually, or through an automated process that you design.

So don't be afraid of metadata -- just be aware that it is generally the invisible creation and storage of metadata, and failing to scrub it before it goes out of the office, that can bite you. There are two major issues regarding metadata that I will address in future posts. First, the metadata we might want to create ourselves to make our jobs easier, and then how to control the metadata so we don't disclose things we shouldn't.

This piece originally appeared on PDFforLawyers.com, and has been reproduced with permission.

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.