The latest addition to the Munich-based PDFlib's suite of developer products is entitled PDFlib Text Extraction Toolkit. Also known as PDFlib TET, the software package is used to extract text from PDF documents, converting it to Unicode strings while preserving font and glyph information. The toolkit is currently available as a library, component and a command-line tool.
Suggested uses include the development of software for searching text, implementing a search engine to process large PDF archives, extracting text for storage or translation, converting PDF text into other formats, content-based processing of PDF documents (e.g. highlighting keywords) and comparing text between multiple PDF documents.
TET has been designed for standalone use, and does not require any third-party software to run effectively. Additionally, the product is robust enough for multi-threaded server use, significantly increasing its capacity. Language bindings including Windows, Macintosh and a several UNIX versions are available for use with various programming environments.
PDFlib TET is currently available for purchase and download.
PDF publishing is mainstream these days, and it's certainly more efficient and eco-friendly than conventional distribution channels. Sometimes, though, it's a pleasure to sit down with a printed magazine, and many readers will never enjoy reading long items digitally. Magcloud is a self-publishing service which combines e-publishing with print-on-demand.
Despite the numerous benefits, there can be potential issues with the conversion of paper documents into electronic archives. When scanning paper pages into PDF, it's possible to end up with the odd- and even-numbered pages in separate PDF files. It can be very time-consuming to collate them manually, but there is an easier way. Sean Stewart explains.
BCL easyPDF SDK is a set of PDF Programming Libraries designed specifically to help Software Developers / Programmers build and deploy enterprise class PDF applications for corporate wide PDF...