New Forum | Previous | Next | (P-PDF) Developers
Topic: Re: Editing PDF File
Conf: (P-PDF) Developers, Msg: 56954
Date: 5/29/2002 05:22 PM
At 02:39 AM 11/30/2001 +1100, p-pdf-developer Listmanager wrote:
>I'm trying to make a Java program that find some string over the stream
>content of a pdf file.
OK. However, you should know that what you are attempting to do
is NOT trivial, since PDF contents aren't necessary described in logical order.
>So far it is only possible when the stream content is not encoded.
That is also an issue. So just decode it! All of the necessary
info is in the PDF specification.
>How can I convert this encoded stream content into Ascii?? thanx
You may not be able to, since the text may be in Chinese (for
example), which isn't ASCII. OR it may use a "symbol font" or specially
constructed font (like a Type 3) which doesn't provide for conversion
As I noted above, text extraction from a PDF is a non-trivial task
and you may well be better served to license the technology from someone
else. If you do decide to tackle it, your first goal is to read the PDF
specification a few times - especially in the sections on text and fonts.