PDF In-Depth

JavaScript - Counting Occurrences of a Particular Token

June 01, 2001


Sometimes, you need to know how many times a particular token occurs in a given string. For example, you may need to know how many times the word "California" is used in a given block of text. Or you may want to know how many e-mail addresses exist in a given string. The hard way to find this out would be to set up some kind of loop that goes through the string character by character or line by line, looking for (and tallying) occurrences of a given token. A far easier thing to do is use the match() method of JavaScript's String object. The match() method is extremely powerful for this, because first of all, it lets you search on a regular expression; and secondly, it returns all the "match substrings" found, in an array. An example will make this clear.

Suppose you have the following String and you want to count the number of e-mail addresses in it:

var str = "kt@acroforms.com, jon2456@aol.com, ed@ed.com";var matches = 

str.match(/@/g); // try to match '@'var count = matches.length; // this is how many! 

If one can make the (sometimes safe) assumption that "and" signs (@) only occur inside e-mail addresses, then we can simply do a search on @, as in str.match(/@/g). Note the lowercase 'g', which means to apply the search globally to the entire string. This is important, because if you fail to include this parameter, the search will end after the first successful match. That's not what we want. We want ALL matches to be found. Always remember that the match() function returns an Array: the Array of matches! Obviously, the length of this Array will be equal to the number of matches found.

A purist would scoff at the sloppiness of this approach, complaining that @ signs do not always equal e-mail addresses. To address this problem, you might want to try a regular expression designed to home in on true e-mail addresses:

var str = "kt@acroforms.com, jon2456@aol.com, ed@ed.com";var m = 

s.match(/\w+[.]?\w*@\w+[.]\w+/g); // regex for e-mail 

addressesvar count = m.length; // the count

Things to note

Bear in mind, a dyed-in-the-alpaca regex dilettante would still scoff at this approach, since the above regular expression is hardly a failsafe regex for checking e-mail addresses. Nevertheless, it's better than what we had. At least now we're looking for one or more word characters (\w+) followed optionally by a period and zero or more word characters, followed by an @, followed by one or more word characters, a period, and one or more word characters.

The derivation of a truly failsafe regex for e-mail addresses (which works across newlines) is left to the reader as an exercise. Bwa-ha-ha...

PDF In-Depth Free Product Trials Ubiquitous PDF

Debenu Quick PDF Library

Get products to market faster with this amazing PDF developer SDK. Over 900 functions and an equally...

Download free demo

Five visions of a PDF Day

In the world of PDFs or as we like to say Planet (of) PDF, a year isn't a real PDF year without an intense few days of industry knowledge sharing.

May 15, 2018
Platinum Sponsor

Search Planet PDF
more searching options...
Planet PDF Newsletter
Most Popular Articles
Featured Product

Debenu PDF Aerialist

The ultimate plug-in for Adobe Acrobat. Advanced splitting, merging, stamping, bookmarking, and link control. Take Acrobat to the next level.


Adding a PDF Stamp Comment

OK, so you want to stamp your document. Maybe you need to give reviewers some advice about the document's status or sensitivity. This tip from author Ted Padova demonstrates how to add stamps with the Stamp Tool along with related comments.