Planet PDF's AcroPDF Weblog
A daily chronicle of Acrobat/PDF-oriented newsbits

For week beginning 27 May 02
By Kurt Foss, Planet PDF Editor

Monday | Tuesday | Wednesday | Thursday | Friday


NOTE: Previous Weblogs will be archived at the end of each week, and start fresh here.

MONDAY

No Post Today: Off for Memorial Day holiday


Top | What is AcroPDF? | Archive | Feedback!


TUESDAY

Tracking Terror: The second-guessers have a new horse to flog, as evidenced by the recent spate of media reports speculating on 'who knew what when and did what, and why' regarding a variety of apparent early warning signs that -- if shared and heeded -- might have proved valuable in heading off the terrorist attacks on New York and Washington, D.C. last fall. Whether due to the lack of a proper communication infrastructure within key departments of the U.S. Government or worse, because of bureaucratic bungling, some watchdogs are barking there were several unrelated indicators from varied, trustworthy sources that together should have sent up a flare or two at the very least, and possibly even tipped off agencies like the CIA and FBI of the now-infamous events of September 11, 2001.

If there indeed are critical information-sharing roadblocks within and among the government's top agencies, departments and officials charged with protecting U.S. citizens, finger pointing fingers and scapegoat naming can wait -- first, the issue(s) must be resolved as quickly and as thoroughly as possible. Part of that process requires a review of all relevant communication and information, both before and after September 11, with a full and fair public accounting.

Among the noteworthy information gathering and dissemination efforts recently released are the U.S. Department of State's 204-page "Patterns of Global Terrorism 2001" annual report and, from the University of Oklahoma Department of Libraries, its 98-page "Annotated Bibliography of Government Documents Related to the Threat of Terrorism and the Attacks of September 11, 2001." Released earlier, but related, is the "Securing the Homeland, Strengthening the Nation" report from and about the newly created "Office of Homeland Security." All documents -- as you might suspect -- are available in PDF.

Whether that's a good thing or not seemed debatable to Secretary of State Colin Powell when he introduced the 22nd such annual report to Congress on terrorism, as mandated by law. Powell reportedly opened his prepared commentary by noting that the 2001 Patterns of Global Terrorism ...

"is not only a very attractive book, but a very, very attractive CD-ROM as well. And it comes complete with its own Acrobat Reader. For those of you who think it might not load, it will load; we've tested it."
It's not clear from Powell's statement precisely what the perceived skeptics believed wouldn't load -- the CD itself, the PDF files or the free Adobe Acrobat Reader. The report is available online as a series of PDF files, or for those without bandwidth or patience constraints, as a single 34 MB file.

No disclaimers accompany the impressive bibliography published by the Oklahoma Department of Libraries, compiled by Kevin D. Motes of its U.S. Government Information Division, who wrote as an introduction:

"This bibliography is intended to serve as a means of access to information produced by the United States Government concerning the events of September 11. Unlike so many of the nations of the world, the United States considers fundamental the right of its citizens to know what their government is doing, the logic behind its actions, and the ramifications of its policies. To this end, our government produces copious quantities of informational materials that are freely accessible to the public through libraries and the Federal Depository Library system. This bibliography presents a sampling of the materials available through the Depository system, via the Internet, or both."
A limited print version of the bibliography -- it focuses on U.S. federal documents on terrorism, most specifically on attacks on the World Trade Center and the Pentagon -- is available for $500 per copy, so the freely downloadable PDF version looks especially appealing.


Global Terrorism September 11 Homeland Security

Top | What is AcroPDF? | Archive | Feedback!


WEDNESDAY

Humped Day for FBI: Little did we realize the impact this PDF-centric Weblog could have! Just a day after we featured several portable documents recalling and recounting the terrible, terrorist events of September 11 and suggesting prompt fixes were in order, the FBI director -- who began his new federal job shortly before the commercial airplane-enabled attacks -- today announced a major reorganization, with a new agency focus on fighting terrorism.

The refocusing of the FBI unveiled by Robert Mueller won't exactly be starting from scratch; the bureau began to shift resources and priorities almost as soon as the hijacked airliners ripped tragic holes through the World Trade Center, the Pentagon, organizations, families, economies and peacetime living as many in the world knew it. Atop the agency's highly visible list of "Top 10 Most Wanted Fugitives" is some guy named Bin Laden. The apparently still-living Al Qaeda leader also sits atop a more recently launched FBI list -- "Most Wanted Terrorists" -- which not surprisingly lists and pictures mainly others believed to have Al Qaeda ties and/or involvement with the September 11 attacks. Also no surprise, the FBI's role in investigating the 9/11 events and seemingly related activities is detailed prominently in the recently released FBI Laboratories annual report for 2001, available in PDF.

FBI and September 11

The lab's director sets the scene and the department's mission:

"When planes crashed into the World Trade Center's Twin Towers, the Pentagon, and a Pennsylvania field on September 11, 2001, and later when anthrax was discovered in letters sent through the U.S. Postal Service, FBI Laboratory personnel were on the scene. Invariably, when peril threatens or takes the lives of American citizens around the world, the FBI Laboratory is called into action."
Of course, while those mind-numbing events were the most despicably significant, the FBI had other high-profile -- including the subsequent anthrax threats, and the prosecution of FBI agent-turned-spy Robert Hansen -- and lesser profile cases to investigate during the year.

FBI and Anthrax

Back to the year's major Lab investigation:

"On September 11, 2001, the Laboratory's Crisis Response Unit, Hazardous Materials Response Unit, and Bomb Data Center were immediately activated. Even before the Twin Towers collapsed, evidence response teams from FBI offices around the country were being dispatched to the crash sites. Fingerprint examiners from the Disaster Squad were sent to assist in identifying victims. Technical units from the Engineering Research Facility (ERF) at Quantico rebuilt the FBI's radio system in New York. Hundreds of radio units were deployed for crisis response and to support nationwide investigations."

"Since then, nearly all Laboratory units have examined evidence from hundreds of submissions or supported the criminal investigations and prosecutions that are still unfolding. The DNA units helped identify victims from biological remains and the hijackers from their personal effects. The Questioned Documents Unit received more than 1,600 specimens, including burnt, water-damaged, and fragmented documents. During the first 30 days of the investigation, the Computer Analysis Response Team examined more than 35 terabytes of data from computers and Internet files. Laboratory employees were dispatched around the world to assist in the investigations."

A special section of the report provides a detailed assessment of the FBI Lab's varied duties and crime scenes on and beyond September 11.


Bin Laden FBI Lab

Top | What is AcroPDF? | Archive | Feedback!


THURSDAY

Battle of the Balls: As a debatably typical American sports fan raised on baseball, the NFL, NBA and other assorted events, I learned a lesson one summer years ago -- never visit European relatives during the World Cup tournament, at least not if you expect or want to have their attention. On the other hand, when their national soccer ... err, football ... team takes the field ... errr, pitch ... you've pretty much got your run of the place, possibly even the city. And depending on the tournament round, maybe even the entire country.

Depending on your desired relationship with any friends and/or relatives you might have the chance to visit in the greater football/soccer-bred (or ill-bred, in the case of some country's fanatical fans) parts of the world during the next three or four weeks beginning today, this is either a very good or very bad time to drop by. In addition, because of the world time zone differences, serious World Cup watchers may be 'zoned out' for extended periods, ignorant of local time while fixated on watching live televised coverage at whatever hour is required.

The Online Herald, Web component for Scotland's "leading quality" newspaper, points out that "Football fever has officially been recognised as an illness by the WHO (World Hype Organisation)," adding the diagnosis:

"Its symptoms include early wakening (some World Cup kickoffs are at 7.30am). It has a disastrous effect on the old and office managers are braced for an explosion of funerals for grannies in afternoons in June."

World Cup 2002 guide

Catering to the needs of its readers, the Herald offers some "some complementary medicine" in the form of a "dazzling 48-page full colour companion" to World Cup 2002, being held in Japan and South Korea from May 31 to June 30. The guide is available for download in PDF in three parts, covering the tournament's atmosphere, statistics and group-by-group analysis. It also includes a printable wall chart.

The medicinal, printable tonic comes with the following Herald prescription:

"To avoid the draining effects of the fever while vicariously participating in the greatest show on earth, simply sit on couch, point remote control at box in corner, and lay this supplement snugly on your lap. Enjoy."

In some ways, World Cup enthusiasts have much in common with the stereotypical soccer-impaired U.S. sports fan.

FOOT-Note: The official 2002 FIFA World Cup Web site hosts a printable tournament schedule in PDF, as well as a series of other downloadable, portable docs on rules, disciplinary code, referee signals, team rankings, event history and, of course, on this year's tournament.


World Cup 2002 Ref signals

Top | What is AcroPDF? | Archive | Feedback!


FRIDAY

All the PDFs All the Time?: As veteran PDF users have been aware, for far too long a serious weakness -- and compelling argument against the use of PDFs on Web sites -- was that content published in PDF files was hidden away in the so-called "Invisible Web," rarely to appear in search engine search results. Tools used to index the Web did not index the full-text of PDF files, and thus people searching for certain kinds of information couldn't discover it.

We reported last year when that unseen cyber barrier started to crack, with Google the first major search site to index PDFs and produce hits in search results. Later Google added automatic PDF-to-HTML conversion, as we also featured. More recently we've highlighted several other promising signs at other search engines and directories, and we continue to closely monitor sites that report on the latest developments in search technologies on the lookout for additional breakthroughs. A recent news item headlined "Searching PDF on the Web" at one of our favorite, most-respected resource sites caught our attention.

The Virtual Acquisition Shelf & News Desk Weblog in fact references a report from another search-monitoring site in noting that the "AllTheWeb" search site is now providing access to .pdf (Adobe Acrobat) material." The chief sleuth of the "Search Engine Showdown" site shared some of his own observations in comparing the PDF searching capabilities of Google and AllTheWeb, as follows:

  • AllTheWeb indexes and provides access to the full-text of PDF files in its database
  • Google does not index and provide access to the full-text of PDF files

This was certainly news to us, having at least been lead to believe Google was indexing the full text of PDF files. Alas, according to these reports, "Google tends to stop indexing at about 120K," meaning that if information you're searching for is located beyond that magic cutoff point within a certain PDF file, Google presumably wouldn't find it. As evidence, the VASND points to a 256kb, 55-page PDF file that can be found in Google's database. Clicking Google's "View in HTML" feature produces only the first 35 pages in Web-coded format.

After jumping to the main "AllTheWeb Does PDFs" news item at Search Engine Showdown, we found further details, including references and links to another PDF that the author tested to reach the conclusions. The search example uses a phrase -- "truck struck the cherry picker basket" -- contained near the end of a PDF file that AllThe Web can find (in fact, it finds the same PDF at two different sites) while Google produces no match because, the author concludes, "the phrase occurs after the indexing stops." Disappointed to learn this, we decided for our own enlightenment to replicate the test process.

AllTheWeb does indeed find the proper PDF file ...

All The Web results

in which we find the phrase four pages from the end of the document:
Search PDF

Google does not. End of story -- conclusive proof that Google's failure was due to having not indexed beyond that point in the PDF? One more simple test might help to support the premise -- searching for a different phrase located even further back in the same file:
Test search phrase

AllTheWeb, as expected, produced a match for the phrase "requiring pedestrian workers to wear high visibility headgear" which is part of the next-to-last sentence in the PDF. Any chance that Google, which had whiffed on the previous phrase located earlier in the same document, would produce a match on this second phrase?

Google results

Bingo! Google did produce a match for the same PDF as AllTheWeb had twice located, which seemed to nix the theory that -- at least for this file -- indexing had been incomplete. To double-check that, now that we had a match, we could use Google's "View as HTML" to see how much of the file it would convert. Result: every last word. Further, Google's search results display were somewhat more useful than those produced by AllTheWeb. Google highlighted the proper title (almost) of the document's content found on page 1: "Build Safer Highway Work Zones" ...
PDF Report Headline

while AllTheWeb highlighted the more cryptic text -- "workzoneromannumeraplages" -- entered in the Title field of the PDF's Document Information fields.
PDF Document Info

So where do we stand in terms of understanding PDF indexing on the Web? Our limited testing only casts doubt on a specific conclusion about a specific PDF file. We're still not clear why Google does not produce a hit for the first phrase when it successfully matches a different phrase located deeper in the same file. Further testing may reveal a reason. As to the general observation that Google *tends* not to index beyond some arbitrary cutoff point in at least some PDFs, a great deal more testing and discussion would be necessary. Search-centric sites and close observers of these technologies are in a better position to undertake such efforts, and we certainly hope that one or more of them will so we have a better understanding of how searchable our PDFs really are on the Internet.

NOTE: We'd love to hear from anyone who has a theory on -- or better yet, who can definitively explain -- why Google missed on the first phrase but found the second in the same file. We'll report back here if/when that's resolved.

In the meantime, we'll just have to keep hoping that some day Adobe will develop a way for PDF indexes created by Acrobat Catalog to be searchable on the Web.


Top | What is AcroPDF? | Archive | Feedback!



MORE INFO

Top


PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link...

Download free demo

Debenu PDF Tools Pro

It's simple to use and will let you preview and edit PDF files, it's a Windows application that makes...

Download free demo

Back to the past, 15 years ago! Open Publish 2002

Looking back to 2002, it's amazing how much of the prediction became a reality. Take a read and see what you think!

September 14, 2017
Platinum Sponsor





Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.

Features

Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.