Previous | Next | (P-PDF) Paper to PDF
Topic: I've got a major project with problems. Help !
Conf: (P-PDF) Paper to PDF, Msg: 31025
From: Maart
Date: 12/5/2001 03:04 AM
To whomever is nice enough to help me out,
I am Maarten Janssen, a 19 year old guy from the Netherlands, and I am working on a job for a lady who is writing a book on a murder case.
This lady collected throughout the years an enormous amount of data. (mostly transcripts of trial proceedings, interviews, official certificates and orders, etc). She wants all this data to be organized and put on a CD. She will then include this CD with the book.
I will now try to explain my problem.
For the menu i decided to go with HTML. This is simple to write and use. As the CD will be inserted, a browser will automaticly start and a HTML index file will come up. From here the user has three options:
1) Read the book from the CD (in PDF format), this version will have hyperlinks if there is extra information on a paragraph in the book. I know how to fix this. This is not so hard and i will be able to do myself. The publisher will deliver the book in the PDF format and I will just integrate some hyperlinks.
2) A mapped lay-out; as in a tree-structure the user will be able to click his way through to the data he looks for Main categories will be: Events and People.
Now here is my first problem.
I've scanned in most of the papers with a paperfeed scanner. This is just in black and white in a 200 dpi. The document was scanned in with paperport. A known scanning program. I created indexes of every subject (pre-trial, trial, post-trial, background info) just in a word document. Each subject I ordered in a different folder. So right now I have a couple of folders that consist out of thousands of files. The files are in the Paperport format (.max) I know what is in what folder due to my own indexes. But what i want to do is convert these to the PDF format. I tried this with the destiller and it works but it does not convert it into text which i want because i want to make a searchable index on the end and therefor the document has to be in text.
This will also be the third option; to search the (by adobe acrobat created) database.
The question is whether I can search this database with an HTML page. So that I just have a field in an html page where you enter your keywords and it will show the results instead of opening acrobat and go to seach index etc. Is this possible and if yes, how ?
If Acrobat is not able to convert my scanned pages into text, can I then insert keyword to each document that it will use to built the index ? Can I also select that it will ONLY use this keyword to create the index ?
If it creates an indexed database does it also create a page in which all found keywords are listed alfabethically ?
I hope there is someone that can help me out.
Thank you very much !!!!
Regards,
Maarten Janssen
maartenontheroad@hotmail.com