Professional JavaScript | 12
|
[next] |
Professional JavaScript
Regular Expressions
Last, but definitely not least, we come to the topic of Regular Expressions (RE), a not-so-simple pattern of characters that can be used to match a sequence of characters in a string. Combined with the right method, regular expressions can perform some pretty heavyweight text search and replace duties. This is not limited to checking that someone has typed in the right kind of phone number in a form element or validating some other kind of data. We could, for instance, search through a bank's records and add new digits to everyone's bank account number or discover the number of times Homer says "Doh!" in the screenplay for an episode of The Simpsons.
JavaScript has had built-in support for Regular Expressions since version 1.2 in the shape of the RegExp object and a few certain methods attached to the String object, all of which we'll come to later, but before we have a look at these, we need to nail down how exactly you construct such an expression.
Rolling Your Own RE
When it comes to producing your own regular expressions, you have two options in JavaScript  you can write them either as literals or as objects. For example
var myRE = new RegExp("R2D2");
var myRE = /R2D2/;
Both these lines do the same thing  assign myRE with a reference to a newly created RegExp object whose expression will match an instance of the sequence "R2D2" in a string. The RegExp object contains the data for the RE you specify when that object is created. Easy. However, you can't really make use of the full power of regular expressions until you learn their alphabet and syntax.
The first point easily learned is that with the exception of two switches, every RE literal is contained within a pair of forward slashes.
var blankRE = / /;
The two switches mentioned earlier are g and i, and affect directly how the search to match your regular expression is conducted. g, the global switch, tells the search to find every instance of your character sequence in the target string, rather than just to find the first and then stop looking. i, meanwhile, tells the search mechanism that the search is case insensitive. For instance:
var myRE = /R2D2/; // finds first instance of R2D2
var myRE = /R2D2/i; // finds first instance of r2d2, R2d2, r2D2 or R2D2
var myRE = /R2D2/g; // finds all instances of R2D2
var myRE = /R2D2/gi; // finds all instances of r2d2, R2d2, r2D2, or R2D2
With that out of the way, we'd better look and see what we can put inside the slashes.
The RE Alphabet
The alphabet for regular expressions incorporates all the alphanumeric characters, upper and lower case, and quite a few other special characters in the form of escape sequences, as shown below. Note that an escape sequence may match one or more ordinary characters or alternatively a special condition that isn't an ordinary character, like the start of a string, as we'll see in the pages to come.
| Character to Match | Corresponding Escape Sequence |
| Any alphanumeric character (a-z, A-Z, 0-9) | Itself |
| Any of . ? / \ [ ] { } ( ) + * | | \ followed by the character. For example \{ matches { |
| Form feed | \f |
| New line | \n |
| Carriage return | \r |
| Horizontal tab | \t |
| Vertical tab | \v |
| ASCII with octal character number Octal | \oOctal |
| ASCII with hex character number Hex | \xHex |
| Control-x where x is any control character | \cx |
| The beginning of a line or string | ^ |
| The end of a line or string | $ |
| A word boundary | \b |
| Not a word boundary | \B |
| Single white space. (tab, space, etc) | \s |
| Single non white space | \S |
| Wildcard character. Anything but a new line | . |
You should be familiar with what the first ten of these items will match up to  we've encountered them already at the very beginning of Chapter 1, so to demonstrate the rest, let's take an example text
Jimmy the Scot scooted his scooter through the park.
The Parky watched Jimmy do this
and we'll go through some easy examples:
var RE1 = /^Jimmy/; //matches "Jimmy" on line 1 but not line 2
var RE2 = /his$/; //matches "this" on line 2 but not "his" on line 1
var RE3 = /\bt/; //matches " the" and " through" with front space
//but not "Scot" or "watched"
var RE4 = /\Bt/; //matches "Scot" or "watched"
//but not " the" or " through"
var RE5 = /t\s./; //matches "Scot scooted" but not "watched"
var RE6 = /t\S./; //matches "watched" but not "Scot scoo"
From these examples, you can see that each character or escape character between the forward slashes of a regular expression stands for a single character only. Some escape characters will match more than one kind of normal character such as the whitespace escape sequence, but only match one character in total.
|
[next] |
Created: February 12, 2001
Revised: February 21, 2001

Find a programming school near you