New Forum | Previous | Next | (P-PDF) Developers
Topic: PDF to Text Conversion
Conf: (P-PDF) Developers, Msg: 92133
Date: 7/16/2003 07:38 AM
I am looking for a way to convert PDF documents into a Unicode text
file. The process needs to support having a separate text file for
each page. Also I need to write an application around the process so
PDF documents can be converted in bulk driven by from a database.
Methods I have considered for processing these documents include the
1. Straight PDF to Text conversion/extraction. All of these tools
either don't support Unicode fonts or corrupt the text output on
2. Using a text printer driver to simply print a PDF file and convert
into a text file. I am unable to find a print driver that supports
Unicode. I may have missed an obvious setting in some of these.
3. Converting a PDF to RTF and then to TXT. While this seems to work
it isn't as efficient as I would like. Also I have to break the PDF
into smaller PDF's for each page before I convert to RTF using the
save as function in Adobe. Although I am sure I could optimize this
method I would prefer to find something more efficient.