New Forum | Previous | Next | (P-PDF) Beginners
Topic: Text in CreateWordHilite does not correspond to page text .
Conf: (P-PDF) Beginners, Msg: 82187
Date: 3/4/2003 01:28 AM
I am writing a VB application which uses the Acrobat object model to insert highlights over certain words. My problem is that the selection rectangle I create for a certain word index does not contain the text corresponding to that word as extracted from the Page.
The sequence of operations I am trying to perform is:
1. Open the document and get the Page to determine the Page Size using Acrobat.CAcroPDPage.GetSize.
2. Create a Acrobat.CAcroRect which I can use with Acrobat.CAcroPDDoc.CreateTextSelect to get a Acrobat.CAcroPDTextSelect object for the entire page.
3. Build the page text buffer of all the text on the page in VB by calling Acrobat.CAcroPDTextSelect.GetText.
4. In VB search the page text buffer for specific words, patterns or phrases.
5. Using the word number (i.e. the index number for Acrobat.CAcroPDTextSelect.GetText corresponding to the matching word) of the text matching the search, add a highlight to an empty Acrobat.CAcroHiliteList.
6. The problem is that if I get a Acrobat.CAcroPDTextSelect from Acrobat.CAcroPDPage.CreateWordHilite and I compare the results of Acrobat.CAcroPDTextSelect.GetText with the matching text in the page text buffer, sometimes it is different. Typically words that match early on in the page text buffer can be successfully highlighted while words towards the end of the page text buffer are the ones that are less likely to match the text in the highlighted area.
The reason why I can't use any of the search techniques already available in Acrobat is that I need to be able to build my own custom searches which may interact with other things, such as databases, text files etc...
Has anyone else tried this successfully or had similar problems?
So far I can only think of the following possible explanations:
1. I have not selected all the text on the page when building the page text buffer (even if I set the page selection to be huge I get the same text back and the same problems).
2. Hidden characters or words which are not exposed by the GetText which I am using to compile the text (difficult to compensate for).
3. Can existing highlights or other document content or formatting prevent selected rectangles retrieving correct text (if so what would the requirements of a document need to be for my technique to work?).
Any suggestions or pointers would be greatly appreciated, thank you...