New Forum | Previous | Next | (P-PDF) Developers
Topic: PDF to Text open source library?
Conf: (P-PDF) Developers, Msg: 142260
Date: 11/4/2005 09:40 PM
Well, hi to all. I am new in .pdf development and recently I took a project from my university as my last cemester practice. It's about recognizing multi-language text from .pdf files.
I took a look in the PDF Reference document but the format is rather complicated and unfortunately I don't have time to prog the text extraction myself. So I was wondering if there's an open source library in Java or C++ that'll help me concentrate in the recognition part.
I also have a question. Is there a way to identify the codepage after the extraction? Something like the codepage and lang tags of html?