PDF In-Depth

Intro to Acrobat 5 Batch Processing

October 20, 2001


An introduction to the capabilities of batch processing and scripting

In April 2001, Adobe Systems released a major upgrade of its Acrobat suite of applications. Version 5 has many new and exciting features, including an enhanced batch processing capabilities. In the article "Acrobat 5.0: Focus on the Document" by Duff Johnson and Susan Frank, the authors state:

"The all-new batch processing utility allows almost any command to be run in batch mode, including tiff conversion to PDF and saving PDF files to PostScript. JavaScripts can also be run in batch. This feature alone is worth the price of admission!"

I couldn't agree with this statement more!

The purpose of this article is to give the reader a sense of the capabilities of the new "batch" feature. For Acrobat users who have not upgraded (and for Acrobat users who have), perhaps you will see that batch processing can solve some of your problems, or streamline your workflow.

The batch feature first appeared in Adobe Acrobat Exchange 3.0. Known then as "Batch Optimize," this feature could optimize all PDF files in a selected folder (optionally, in all subfolders) and could create (or delete) all thumbnails.

In Acrobat 4.0, the batch feature, re-named to "Batch Process," was extended to set (or change) the open password, to set security and to set the open document information.

Batch iconIn Acrobat 5.0, the approach to and the interface for batch processing has completely changed. There is a much wider variety of standard commands that can be run on the selected files; customized commands are possible through a special 'Execute JavaScript' command. Acrobat 5.0 gives the user the ability to select individual files, not just all files in a folder; files open in the viewer can also be selected. Finally, the user can select the destination of the output files, and optionally, the files can be renamed by attaching a prefix and/or a postfix to the base name.

The Standard Sequences

The batch processing feature comes with a number of standard sequences. These sequences are divided up into five catagories according to their functionality:

  • Comments
  • Document
  • JavaScript
  • Page
  • PDF Consultant

Edit Batch Sequence

These will be discussed in detail below.

A Simple Walkthrough

Even though this article is not a tutorial on using the batch feature, we provide some basic steps for creating a batch sequence.

PROBLEM: For a collection of PDF files, we wish to enter (some) of the document summary information: title, subject, author, keywords and binding.

You can access this feature through the file menu File > Batch Processing. Choosing Batch Processing opens the "Batch Sequences" dialog. Choose "New Sequence..." and type in "A Test Sequence" to reach the "Batch Edit Sequence" dialog.

Create a Test

The Batch Edit Sequence dialog has three parts:

A Test Sequence Batch

Select sequence of commands Through this item, you can define the commands you want performed on each of the selected files. The selected files are determined by item 2. See the section The Standard Sequences Enumerated for a list and brief discussion of the commands provided by the batch feature.

Run commands on Select the files to be processed. The possible choices are "Ask When Sequence is Run," "Selected Files," "Selected Folder," "Files Open in Acrobat."

Select output location Select where the processed files are to go. Choices are "Same Folder as original(s)", "Specific Folder," "Ask When Sequence is Run," "Don't Save Changes."

Step 1: Each of these steps will be discussed in detail below, for now, click on the "Select Commands" button and enter into the "Edit Sequence" dialog. You'll see the five catagories mentioned above in the left-hand panel. Highlight "Document Summary" under the Document category and click on the 'Add' button. The "Document Summary" item will appear in the right-hand panel. More than one command can be run per file, so you can make multiple choices, "Security", for example, can be chosen as well.

Next, double-click on "Document Summary" (or, single click, followed by a click on the "Edit" button). A dialog will be offered to you in which the title, subject, author, keywords and binding can be pre-entered. For our example, enter only the author: remove the check from "Leave As Is" under the author, then enter the author's name, "A. C. Robat". Now click on the "OK" button, you'll return to the "Edit Sequence" dialog, click on "OK" again to return to the "Edit Batch Sequence"

Step 2: In the "Edit Batch Sequence" dialog, select your test files. For example, you can choose "Selected Files", then browse and select the files (click on the "Choose ..." button). All chosen files must come from the same folder.

Step 3: For the third item on the "Edit Batch Sequence" dialog, set this item to "Same folder as Original(s)".

You might click on the "Output Options...", you'll get to the "Output Options" dialog. There you will see how to change the name of the output file, if desired. There is also an output format dropdown menu; it should be set on "Adobe PDF Files". This menu is key to making file format conversions to Encapsulated PostScript, JPEG files, PNG files, PostScript file, Rich Text Format and TIFF files. Under this dropdown menu is a check field for "Fast Web View (PDF only)"; check or uncheck this item to optimize the output files for the Web, as desired.

You are now ready to run the batch sequence. As necessary, click on "OK" button in the "Batch Edit Sequence" to return to the "Batch Sequences" dialog. Finally, click on the "Run Sequence" button; all the pre-entered information will be entered into the document summary part of each of the selected files.

Running a batch sequence

An Important Variation: We just entered the author's name into the author field of the document summary, but what about the title and subject fields? Whereas the author might be pre-entered, the title and subject varies from document to document. To modify the title and subject fields for each document, we need to pause the batch just before each document is processed. The "Interactive Mode option" is used for this purpose.

Return to the "Edit Sequence" dialog of our test sequence. In the right pane, click on the "Interactive Mode option" (the inset rectangle that appears to the left of the command). With the "Interactive Mode" turned on, the batch process will pause before the command is executed, allowing the user to modify any associated options.

Now run the test sequence again, the document summary dialog will be presented for each of the selected documents, the author field, "A. C. Robat" has already been entered. Enter the title and subject of the document and click on "OK". When the batch has finished its run, you will find that the title, subject and author fields have been filled in.

Standard Sequences Enumerated

As mentioned above, there are five catagories of standard sequences. Here's a quick survey so people who have not upgraded to Acrobat 5 can get a sense of the capabilities of batch.

  1. Comments: There are two sequences listed here, "Delete All Comments" and "Summarize Comments". In the latter case, for each selected file, a report (a PDF doc) that summarizes the comments contained in that file. The report is saved by attaching a suffix of 'sum' to the base file name.

  2. Document: There are ten sequences listed: "Accessibility Checker", "Document Summary", "Embed All Thumbnails", "Extract Images as JPEG", "Extract Images as PNG", "Extract Images as TIFF", "Print", "Remove Embedded Thumbnails", "Security", and "Set Open Options".

    The "Accessibility Checker" generally follows the Web Content Accessibility Guidelines 1.0. (see also the document "Adobe Acrobat 5.0 and Section 508," a tagged PDF ).

    The extraction sequences have a dialog that allows you set the graphics options and to set the path of the saved images. The "Document Summary" was discussed in A Simple Walkthrough.

  3. JavaScript: Only one command listed "Execute JavaScript." Because of the importance of this command, an entire section is devoted to it. See The Execute JavaScript Command below.

  4. Page: There are five pretty much self-explanatory commands listed:

    • "Crop Pages"
    • "Delete Pages"
    • "Insert Pages"
    • "Number Pages"
    • "Rotate Pages"

  5. PDF Consultant: There are three commands:

    • "Audit Space"
    • "Detect and Remove"
    • "Optimize Space"

    These functions are also available in non-batch form under the Tools > PDF Consultant menu. The sequence "Audit Space" gives a summary report of different objects present within the file: thumbnails, bookmarks, content streams, forms etc. The report is a modeless dialog; there seems to be no way to copy this information for closer examination. "Detect and Remove" and "Optimize Space" must be used very cautiously.

    For "Detect and Remove" you can optionally remove all comments, form actions, JavaScripts actions, external cross-references and alternate images. DANGER: (this is not made clear in the dialog) The (remove) "All Comments" also removes ALL form fields! Forms and comments should be separated. One might want to keep the form fields, remove all field actions, but also remove all comments, file attachments etc. Currently, this cannot be done.

    The "Optimize Space" command allows you to remove bookmarks with invalid destinations, remove links to invalid destinations, and to change unused named destinations. I took a test document (610k) with a large number of named destinations and set up "Optimize Space" to convert all named destinations to direct links. The resultant file was 563k in size; that's a 7.7 percent reduction in size.

On File Conversions

It is possible to make a variety of file conversions: PDF, BMP, GIF, JPEG, PCX, PNG and TIFF to any one of the formats, PDF, EPS, JPEG, PNG, PS, RTF and TIFF. I believe this is accomplished by first converting the input file to PDF, then converting the PDF to the output format.

A file conversion can be created in the "Edit Batch Sequence" dialog.

  1. Select Sequence of Commands: For a straight conversion sequence this item can be empty (no commands listed).

  2. Run Commands On: If you choose "Selected Folder" then click on the "Source File Options" button. Within this dialog, you can include or exclude various file formats for conversion. Note that PDF cannot be excluded.

  3. Select Output Location: Usually, you would choose "Same Folder as original(s)" or, "Specific Folder".

    Click on the "Output Options" button to obtain the "Output Options dialog. In the "Output Format" region, the dropdown menu can be used to define the output format.

No control over the quality of the conversion is offered within the batch dialogs; however, when conversion occurs, the current values of the format settings as listed in the menu items "File > Open as Adobe PDF" (for input) and "File > Save As" (for output) of the various file formats are used in the conversion process. By adjusting these settings, you can control the final output of the file conversion.

This seems very inconvenient, but it works. Perhaps the next version of Acrobat will allow you to change conversion settings from within the Batch dialog.

Miscellaneous Comments

  • Multiple commands can be run per file. Commands are applied in the order in which they are listed. The order can be modified in the "Edit Sequence" dialog.

  • When "2. Run commands on:" is set to "Selected Folder", the commands will be run on every file in that folder and every subfolder. If you don't want the batch to run on every subfolder, then "2. Run commands on" should be set to "Selected Files" (or "Ask When Sequence is Run") and select the individual files you want the batch to process.

  • In "2. Run commands on:", for all choices except "Files Open in Acrobat", the selected files must come from the same folder (and all subfolders in the case of "Selected Folder"). You can process files from different folders, but these files must be open in Acrobat and the "Run commands on" is set to "Files Open in Acrobat."

The Acrobat CD: Sample Sequences

Locating the User's Sequences Folder

The Acrobat CD has many sample sequences for you to use and/or modify. These sequences should be placed in the user's "Sequences" folder. The location of this folder depends on the platform and operating system.

If you created the test sequence, "A Test Sequence," as described above, Acrobat saved this sequence in the user's "Sequences" folder under the name of "A Test Sequence.sequ". Search for this file "A Test Sequence.sequ" (alternately, search for the "Sequences" folder) and you have found the user's sequence folder.

Installing the Sample Sequences

Batch Scripts

NOTE: See Dr. Story's chart with descriptions of the free batch scripts available on the Acrobat 5 CD-ROM.

Place your Acrobat 5 CD in your CD-drive and click on the "Additional Materials" button, then click on the "Browse Batch Folder" button. (Those that purchased Acrobat 5 with an Academic discount may not have some of the extras, such as the batch sequence samples.) You will see a PDF file called "BatchSequences.pdf" and a "Sequences" folder.

Enter the "Sequences" folder and copy all the batch sequences to the User's "Sequences" folder, the location of which was just described. (Copy the "BatchSequences.pdf" file to some folder on your hard drive as well. This file will be discussed below.) You may not want to use all these sequences, they can be deleted later.

Important: All the sequences on the CD are 'read-only,' so to use any of them you'll have to remove this attribute.

The next time you start Acrobat, all these sample sequences will be listed under the "Batch Processing" submenu.

Important: Before attempting to run any of these sample sequences, read their documentation! The documentation for each of these sequences is contained in the PDF file "BatchSequences.pdf".

Fast Web View: The "Fast Web View" sequence is one of the provided sequences everyone should install on their system. It enables you to batch optimize the selected files. If you edit the sequence, it would appear the sequence does nothing! In the "Batch Edit Sequence" dialog, click on "Output Options." You'll see that "Fast Web View (PDF only)" is checked. All this sequence does is take a file (it doesn't even have to be a PDF) and save it in optimized form. Before you run this sequence, you need adjust the "Run commands on" and/or the "Select Output location" appropriately. If you regularly run this sequence on a particular folder, you can set up a "Fast Web View (my Folder)" sequence so you can just run it with out making any adjustments.

On Processing files with Password Security

Let me illustrate the use of one of these sample sequences, "Change Security to None." Suppose you had a collection of PDF files all with the same open password and you wanted, for example, to remove this password protection as well as all other security.

If you select the files to be processed and run the batch on them you will get a message in a "Warnings and Errors" alert box that the documents could not be opened, no documents were processed. This is because the documents have password protection!

To resolve this problem, go to Edit > Preferences > General and click on "Batch Processing" in the left-hand panel. Under "Security Handler" choose "Acrobat Standard Security" and click on "OK." Now, run the "Change Security to None" sequence again on your test files. A "Password" response box appears, enter the common password for all the files to be processed. Now the sequence runs as advertised!

When processing password protected files, always check the current preferences for batch processing, and don't forget to reset the handler back to "Do Not Ask for Password" after you have finished processing secure files.

The Execute JavaScript Command

Acrobat 5 comes with a tremendously expanded JavaScript capability; for example, the AcroJS manual grew from about 65 pages for Acrobat 4.05 to 295 pages for Acrobat 5.0. Much of the functionality of Acrobat has been exposed to JavaScript. With JavaScript you can manipulate pages, bookmarks, form fields and security; connect to a database, populate a form or extract form data to a database; sign documents; write reports based on information obtained by JavaScript methods; and much, much more.

With this in mind, Acrobat provides the most powerful batch command of all, "Execute JavaScript." The functionality of JavaScript for individual documents is available in batch. The Acrobat CD provides two important resources for the "Execute JavaScript" command.

CD Scripts

  1. BatchSequences.pdf. (See the 'Batch' folder of the CD.) This is a two-part document, "Basic Concepts" and "Problems."

    Basic Concepts describes how to write your own JavaScript sequences; it discusses various tricks and techniques for writing such sequences. The Problems section is the documentation of the sample sequences that accompany the distribution.

  2. The Sample Sequences: Installation for these sequences was covered in The Acrobat CD: Sample Sequences.

As documented in "BatchSequences.pdf," the sample sequences solve certain problems users have wanted simple solutions to for years. The sample sequences are presented as solutions (partial solutions) to problems. The problems are broken down into catagories: Utility sequences, bookmarks, Icons, page extraction, database problems, security, Insert Comments and Form Fields, Spell checking, and Comments.

The names of the sample sequences are quite suggestive of their purpose, for example:

  • "Count Bookmarks"
  • "Insert Navigation Icons"
  • "Populate and Save"
  • "Gather DigSig Info"
  • "Signature Sign All"
  • "Insert Stamp"
  • "Cross-Document Comment Summary"

Depending on the problem to be solved, using "Execute JavaScript" could be very easy or a little tricky. The following three examples -- aptly named "The Good," "The Bad" and "The Ugly" -- illustrate this point.

The Good

PROBLEM: Extract all pages and save to a folder.

Solution: This example is the "Extract Pages to Folder" sequence that comes on the Acrobat CD. For each of the selected files, this sequence extracts each page and save it under the file name of filename_#.pdf, where # is the page number. This sequence uses regular expressions to strip off the path of the file and get at the base file name, then uses the Doc.extractPages() method of extracting the pages.

NOTE: The Doc object of the file currently being processed can be accessed through 'this'; thus, 'this.path' is referring to the path of the file being processed.

/* Extract Pages to Folder */

// regular expression acquire the base
name of file
var re = /.*\/|\.pdf$/ig;

// filename is the base name of the file
Acrobat is working on
var filename = this.path.replace(re,"");

try {
for ( var i = 0; i < this.numPages; i++)
// See the AcroJS documentation on Parameter
// Specification for Methods, page 32
       nStart: i,
       cPath: "/F/temp/"+filename+"_" + i +".pdf"
} catch (e) { console.println("Aborted: " + e) }

You can see, that wasn't too bad -- it was Good!

The Bad

PROBLEM: Count total number of pages in the selected set of documents and report this number to the console.

Solution: To count the number of pages, we need a global variable to hold the count while the process runs. Create a new sequence called "Count All Pages". This might be a good chance to use "Don't Save Changes" option for the "Select output location". (We want to count pages, we don't want to modify the documents themselves. Choosing this ouput location will make the batch run much faster.) In the "Batch Edit Sequence", click on "Select Commands," choose "Execute JavaScript," and click on "Edit." Copy the following script into the editing window:

/* Count Pages */
if ( typeof global.PageCnt == "undefined") 
    global.PageCnt = 0;    
global.PageCnt += this.numPages;
console.println("The file " + this.path + " has " 
    + this.numPages);
console.println("Total so far = " + global.PageCnt);

Comments: If global.PageCnt is undefined, it is initialize 0. As the documents are processed, a running total of the page numbers is maintained in the global variable, 'global.PageCnt += this.numPages;' and results are reported to the console. Typically, when doing "cross-document" reporting, global variables need to be used to hold information; variables defined within the batch are of local scope, unless they are defined as global. After the batch is finished and you have your total, you need execute delete global.PageCnt so that if you run this sequence again during the same session, you don't continue the count from where the previous one left off. Or, don't delete it if you want to continue the count for files in other folders.

PRACTICE: Modify the above code to count the total number of words in the selected set of documents.

The Ugly

Writing a report based on cross-document processing can be tricky. The following example illustrates the first level of complication. See the code for the sample sequences "Cross Doc Comment Summary" or "Gather DigSig Info," and read about these sequences in the documentation file "BatchSequences.pdf," for the next level of complication.

PROBLEM: Report on ALL form fields in a given document.

Solution: The Report object (new to Acrobat 5.0) is an important tool for reporting results in a form that can be printed, saved or distributed to others. See the AcroJS manual for details and examples of using this object.

Important: This script can only be run on one file, not multiple files. (The Multiple files case is the next level of complication, left to the reader.)

/* List All Fields (1 doc)*/
var bmReport = new Report(); // create
a report
bmReport.size = 2;  
// choose text size
bmReport.writeText(this.title);  // write title to
bmReport.writeText(this.path);   // write path
to report
bmReport.writeText(" "); //
write blank line
bmReport.size = 1.5;   
bmReport.writeText("List all Field Values"); 
bmReport.writeText(" ");
bmReport.size = 1;       //
text size for report

global.bmRep = bmReport; //
make global

var fieldData, a; var nFields = this.numFields;// get number of fields var cFieldName = ""; // work our way through all fields in this document, // extracting info depends on type of field it is, so we do a // switch. for (var i = 0; i < nFields; i++) { cFieldName = this.getNthFieldName(i); var oField = this.getField(cFieldName); switch(oField.type) { case "text" : var fieldData = util.printf("Field Name: %s" +" Type: %s" +"Value:%s", cFieldName,oField.type,oField.value); break; case "listbox" : a = oField.currentValueIndices; if (typeof a == "number") { fieldData = util.printf("Field Name: %s" +"Type: %s" + "Value: %s (%s)", cFieldName, oField.type, oField.getItemAt(a,false), oField.getItemAt(a,true)); } else { console.println("Multiple selection list box"); fieldData = util.printf("Field Name: %s" +"Type: %s" +" Values: ", cFieldName, oField.type); for (var j = 0; j < a.length; j++) fieldData += util.printf ("%s (%s)%s", oField.getItemAt(a[j], false), oField.getItemAt(a[j], true), (j==a.length-1) "" : ", "); } break; case "combobox" : a = oField.currentValueIndices; if ( a == -1 ) // editable combobox { fieldData = util.printf("Field Name: %s" +" Type: %s" +" Value: %s (user entered)", cFieldName, oField.type, oField.value); } else { fieldData = util.printf("Field Name: %s" +"Type: %s" +"Value: %s (%s)", cFieldName,oField.type,oField.getItemAt(a, false),oField.getItemAt(a, true)); } break case "checkbox" : fieldData = util.printf("Field Name: %s" +" Type: %s" +" Value: %s", cFieldName, oField.type, oField.value); break; case "radiobutton" : fieldData = util.printf("Field Name: %s" +"Type: %s" +" Value: %s", cFieldName, oField.type, oField.value); break; case "signature" : console.println("Signature.value = " + oField.value); fieldData = util.printf("Field Name: %s" +"Type: %s" +"Value: %s", cFieldName, oField.type, (oField.value == "") ? "Not signed" : "Signed"); break; case "button" : fieldData = util.printf("Field Name: %s" +"Type: %s" +" Value: ", cFieldName, oField.type); } // write the fieldData to the report. global.bmRep.writeText(fieldData); } /*

Now we need to open the report so it will appear in the viewer. However, we must wait until the modeless box closes before the report can be opened, so we wait. This is the reason we made the report object global. The batch is over, so all variables are lost except the global ones. Let's wait.......

global.wrtDoc = app.setInterval(
    'try {'
    +'reportDoc = global.bmRep.open("List of Field Values");'
    +'console.println("Executed Report.open");'
    +'delete global.wrtDoc;'
    +'console.println("Executed App.clearInterval");'
    +'reportDoc.info.title = "Listing of all text Fields";'
    +'reportDoc.info.Author = "A. C. Robat";'
    +'} catch (e) {console.println("Waiting...: " +
    , 100);

If all goes according to plan, the report (in PDF) will appear in the viewer, we can only hope.

Practice: Modify the above code to report on all properties of each type of field. Include such things as the bounding rectangle of the field and the page or pages on which it located. Use color and tabbing to better organize the report.

Concluding Remarks

The new Batch Processing is very powerful and flexible indeed. As people begin to use it and to explore its capabilities, it will become one of most valuable features of Acrobat. I believe this 'Batch' version is a very good first start; to repeat, it is an excellent start. My hearty congratulations to the Acrobat team for a job well done here!

As the popularity of the feature increases, however, there will be a greater need to improve 'Batch' in a number of different areas.

  1. User Interface. Very often, the people who run a batch (as part of a regular routine) are not the ones who wrote the batch. The interface must be made easier for non-technical people to adjust the parameters and run. Perhaps in the "Run Sequence Confirmation" there can be a drop down menu to change the input files.

  2. Better Sequence Management. If you include all the sample sequences provided on the Acrobat CD, you already have nineteen sequences, add a few more of your own, and you have a menu mess. A redesign of the batch menu system; perhaps introducing folders to better organize sequences with different applications.

  3. Batch Level JavaScript. We have the notion of document-level JavaScript already. It would be nice to be able to call general batch-level scripts. This would clean up a lot of repeated code. This would also be a convenient place to hold cross-document variables.

  4. Begin/End Job. Cross-document processing can get nasty. One of the problems is that there is no notion of the beginning of the job and the end of the job. Several "hacks", as documented in "BatchSequences.pdf", were necessary in order to detect the beginning of the batch and the pending end of the batch. A good feature would be to have a way of programmatically detecting these two events; or to have some formalized construct that can be used, a Begin Batch (End Batch) command---JavaScript only---that is run only once at the beginning of the job (at the end).

  5. Report Object. This important object needs to continue to evolve. Perhaps support for running headers/footers, greater line control. It would be nice to interrupt writing a line to, say, create a button, or create a signature field and sign it. Improving the util.print() method (include tabbing fields) would help better organize a report. A better solution for writing reports than waiting until after the batch is run (see The Ugly Example) must be found.

  6. Chaining Sequences together. For some problems, it is necessary to run two batch sequences on the same set of files; the first one gathers information that is used by the second one. For example, for cross-document report writing it is necessary to know the number of files that are being processed to determined when the end of the job occurs. The first sequence might count the files (See "Count PDF Files", a sample file on Acrobat CD), while the second does the report writing. To write sophisticated batch sequences that perform complex tasks, "chaining of sequences" might be needed; i.e., one sequence calls the next. Also, one sequence needs to be able to adjust the parameters of the sequence it calls, such as setting the output folder.

  7. Extract Images. These standard batch commands need to have their own dialog for setting the quality of the images that will be saved.

The 'Batch' is rich in features -- and in possibilities. I encourage everyone to explore batch processing, see what it can do. This can not only help your workflow, but also encourage Adobe to continue to improve upon this excellent start.

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Back to the past, 15 years ago! Open Publish 2002

Looking back to 2002, it's amazing how much of the prediction became a reality. Take a read and see what you think!

September 14, 2017
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.