New Forum | Previous | Next | (P-PDF) Developers
Topic: Re: Text extraction
Conf: (P-PDF) Developers, Msg: 58628
Date: 5/29/2002 05:33 PM
At 05:14 AM 11/3/2000 -0500, p-pdf-developer Listmanager wrote:
> I would like to extract only the text content from a pdf file.I have
> been using the pdftotext tool from Xpdf which pretty much does the
> job,but i have been having problems with older pdf files that uses LZW
> compression since LZW is patented.How can i get around this problem ?
I don't know what version of pdftotext you are using, but the
latest versions include support for LZW by "piping" the compressed data out
to another process (behind your back). I was one of the people who
contributed that part of the code.
We've talked a similar approach in our SPDF library, due to
licensing problems with those "wonderful" folks at Unisys.
>Or are there any better ways to get the text from a pdf file like using
>I am using VC++
I don't believe that any of the PDF ActiveX/COM objects support