Previous | Next | (P-PDF) Developers
Topic: /Charset definition
Conf: (P-PDF) Developers, Msg: 51087
From: aandi
Date: 5/29/2002 04:43 PM
Let's see, some things to consider.
1. Some files do not have text that can be extracted. Before assuming you have missed something, check that Acrobat is able to copy/paste text. If it cannot, then you cannot either.
2. CharSet isn't anything that can help.
3. You say the encoding is Mac encoding. Are you confirming this or assuming it? In the general case you may have to parse the font data to get the encoding for the font.
4. "check to see if it an escape character and lookup the unicode" - I don't understand this. There are no escape characters recognised by Tj. There are the regular PDF string escapes, which you should process before examining the string data, but no way to use Unicode with Tj through escapes - unless the underlying font has a unicode encoding (CMap).