spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / experts / javascript / column5


Unix Regular Expressions

Developer News
Mandrake Linux Founder Back, Virtually
Amazon: We're a Technology Company
Sun Expands MySQL With Closed Source

Constructing Regular Expressions

In this section we'll discuss the basics of regular expressions. Before we dive into interpretation rules, let's examine some characteristics of regular expressions.

Most characters in a regular expression simply match themselves. If you string several characters in a row, they must match in order. So, if you write the pattern:

/Bart/

it won't match unless the string contains the substring "Bart" somewhere. The following pattern can be used to determine roughly if a string is a real e-mail address:

/@/

As we proceed, we will discuss much more reliable patterns for e-mail verification.

Some characters don't match themselves, but are metacharacters. You can match these characters literally by placing a backslash in front of them. For example, "\\" matches a backslash and "\$" matches a dollar-sign. Here's the list of metacharacters:

\ | () [ { ^ $ * + ? .

A backslash also turns an alphanumeric character into a metacharacter. So whenever you see a backslash followed by an alphanumeric character:

\d \D \w \W \t \s \3

you'll know that the sequence matches something strange. For example, \t matches a tab character, while \d matches any digit. Some sequences are actually zero characters wide. For instance, "\b" matches a word boundary, which is not a real character -- it is zero characters wide.

Regular expression are mostly assertions, i.e. plain characters that simply assert that they match themselves. We'll use the term "assertions" for the zero-width ones. Non-zero-width assertions are called atoms. As there is no standard terminology, we use the one from "Programming Perl." As a matter of fact, most of our explanations are based on this great book.

Regular expressions can include non-assertions, such as the alternation operator, which is indicated with a vertical bar:

/Homer|Marge|Bart|Lisa|Maggie/

Any of those strings can trigger a match. That is, the preceding expression matches all of the following strings:

  • "Homer"
  • "Bart"
  • "Lisa Simpson"
  • "Simpson, Marge"

You can group various sorts with parentheses, as in the following expression:

/(Homer|Marge|Bart|Lisa|Maggie) Simpson/

A common mistake is to forget the parentheses:

/Homer|Marge|Bart|Lisa|Maggie Simpson/

Unlike the previous pattern, this one matches the followings strings, because "Simpson" belongs only to "Maggie":

  • "Simpson, Bart"
  • "Marge"
  • "Homer"
but it does not match the string "Maggie".

Quantifiers say how many of the previous substring should match in a row. Here are a few quantifiers:

* + ? {4,8} {5,}

Quantifiers can only be put after atoms, assertions with width. They attach only to the previous atom, so if you want a quantifier to apply to multiple characters, you must group them together, like this:

/(Bart){3}/

This pattern matches "BartBartBart", whereas the following pattern matches the string "Barttt":

/Bart{3}/

http://www.internet.com

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Whitepapers and eBooks

Intel Whitepaper: Comparing Two- and Four-Socket Platforms for Server Virtualization
IBM Solutions Brief: Go Green With IBM System xTM And Intel
HP eBook: Simplifying SQL Server Management
IBM Contest: Are You the Next Superstar? Join the "Search for the XML Superstar" Contest to Find Out
Microsoft PDF: Top 10 Reasons to Move to Server Virtualization with Hyper-V
Microsoft PDF: Six Reasons Why Microsoft's Hyper-V Will Overtake Vmware
Microsoft Step-by-Step Guide: Hyper-V and Failover Clustering
Intel PDF: Quad-Core Impacts More Than the Data Center
Intel PDF: Virtualization Delivers Data Center Efficiency
Go Parallel Article: PDC 2008 in Review
Microsoft PDF: Top 11 Reasons to Upgrade to Windows Server 2008
Avaya Article: Communication-Enabled Mashups: Empowering Both Business Owners and IT
Intel Whitepaper: Building a Real-World Model to Assess Virtualization Platforms
  PDF: Intel Centrino Duo Processor Technology with Intel Core2 Duo Processor
Microsoft Article: Build and Run Virtual Machines with Hyper-V Server 2008
Go Parallel Article: Q&A with a TBB Junkie
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
IBM eBook: The Pros and Cons of Outsourcing
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
HP eBook: Guide to Storage Networking
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
webref The latest from WebReference.com Browse >
Popular JavaScript Framework Libraries: An Overview - Part 3 · Accessing Your MySQL Database from the Web with PHP · Working with the DOM Stylesheets Collection
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Crucial Triples Up With New Three-Channel DDR3 Kits · Meet the Finalists: Excellence in Technology Awards · Tealeaf Offers Insight to Mobile Customer Behavior

Created: October 23, 1997, 1997
Revised: December 4, 1997
URL: http://www.webreference.com/js/column5/construct.html