Planet PDF Forum Archive

Planet PDF ForumWowsers! This is page is old, head to the LIVE Planet PDF Forum. It features more than 10 conferences, covering everything from beginner to in-depth developer and pre-press discussions. If you wish to continue... one & two archive covers 1999-2011 (160,000 pages).

New Forum | Previous | Next | (P-PDF) Developers

Topic: extract text and check embedded fonts?
Conf: (P-PDF) Developers, Msg: 69603
From: rogerkaplan
Date: 7/31/2002 07:30 AM

I'm trying to replace a PDF workflow that is several years old. There are two things I need to do:
1. be able to roughly extract the text from the PDF, in order to feed it to a search engine.
2. examine the file to verify that all fonts (outside of the base14) are embedded (subsetting is OK but not substitution)

Can someone recommend a cheap/easy way of doing this?

I've seen various attempts at (1) already discussed, and it appears that compression and various word placement artifacts make this nontrivial to program directly (shame!). The existing code is using an old version of the Adobe API but we'd like to not use it going forward due to Adobe's server licensing stance.
Existing libs seem to be the way to go, but given that what I'm doing is somewhat trivial (IMO) and most libs seem to be focused on generating PDFs rather than reading them, I'm having problems finding a free/cheap/shareware solution.

I'm also hoping that (2) can be done easily by inspection? Before I tackle the PDF spec, I'd like to know if it's a wild goose chase...

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link...

Download free demo

Debenu PDF Tools Pro

It's simple to use and will let you preview and edit PDF files, it's a Windows application that makes...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.