JS Regular Expressions

Regular Expressions, known as ‘regex’ or ‘RegExp', are specially formatted text strings used to identify patterns in text. It is one of the most powerful tools available today for efficient and effective text processing and manipulations. For instance, they can be used to verify if the format of data (i.e name, email, phone number, etc.) entered by the user is correct or not, find or replace a matching string within text content, and so on.

The Perl style regular expressions are supported in JavaScript. Many persons will ask why Perl style regular expressions? It is because Perl (Practical Extraction and Report Language) was the first mainstream programming language. It provided integrated support for regular expressions and it is well known for its strong support of regular expressions and its extraordinary text manipulation and processing capabilities.

In the next section, we will see a brief overview of the most used JavaScript's built-in methods for performing pattern-matching before delving deep into the world of regular expressions.

FunctionWhat it Does
exec()Search for a match in a string. It returns an array of information or null on mismatch.
test()Test whether a string matches a pattern. It returns true or false.
search()Search for a match within a string. It returns the index of the first match, or -1 if not found.
replace()Search for a match in a string, and replaces the matched substring with a replacement string.
match()Search for a match in a string. It returns an array of information or null on mismatch.
split()Splits up a string into an array of substrings using a regular expression.

Note: The methods search(), replace(), match() and split() are String methods that accept a regular expression as a parameter, while the methods exec() and test() are RegExp methods that accept a string as a parameter. 

 

Defining Regular Expressions

The regular expressions are represented by the RegExp object in JavaScript. It is a native JavaScript object like String, Array, etc. There are two ways of creating a new RegExp object and they are: using the literal syntax, and the second one is using the RegExp() constructor.

The literal syntax employs forward slashes (i.e /pattern/) to wrap the regular expression pattern, while the constructor syntax makes use of quotes (i.e "pattern"). The example here illustrates both ways of creating a regular expression that matches any string that begins with "Mr.".Mr."):

// Literal syntax 
var regex = /^Mr./;

// Constructor syntax
var regex = new RegExp("^Mr.");

As you can see from the example above, the regular expression literal syntax is shorter and simpler to read. However, it is preferable to use literal syntax. You will see it in this tutorial as it is used.

Note: If you are using the constructor syntax, you need to double-escape special characters, which means to match "."; you need to write "." instead of ".". but if there is only one backslash, it would be interpreted by JavaScript's string parser as an escaping character and then removed.

 

Pattern Matching with Regular Expression

The use of letters, digits, punctuation marks, etc. are part of a regular expression. It also includes a set of special regular expression characters (please, do not confuse it with the HTML special characters).

The regular expression characters given special meaning in the expression are:

. * ? + [ ] ( ) { } ^ $ | . 

However, you will need to backslash these characters whenever you want to use them literally. For instance, if you want to match ".", you need to write ‘.’. But here, characters automatically assume their literal meanings.

The sections below describe the various options available for formulating patterns:

 

Character Classes

A pattern of characters surrounding a Square brackets characters is called a character class (e.g. [abc]). A character class always matches a single character out of a specified list of characters. It means the expression [abc] matches only a, b, or c character.

More so, negated character classes can also be defined to match any character except those contained within the brackets. it is defined by placing a caret (^) symbol immediately after the opening bracket, such as [^abc], and matches any character except a, b, and c.

A range of characters can also be defined by using the hyphen (-) character inside a character class (e.g [0-9]). 

The table below shows some examples of the character classes:

RegExpWhat it Does
[abc]Matches any one of the characters a, b, or c.
[^abc]Matches any one character other than a, b, or c.
[a-z]Matches any one character from lowercase a to lowercase z.
[A-Z]Matches any one character from uppercase a to uppercase z.
[a-Z]Matches any one character from lowercase a to uppercase Z.
[0-9]Matches a single digit between 0 and 9.
[a-z0-9]Matches a single character between a and z or between 0 and 9.

The examples below will show us how to find out if a pattern exists within a string or not while using the regular expression with the JavaScript test() method:

var regex = /ca[kf]e/;
var str = "He was eating cake in the cafe.";

// Test the string against the regular expression
if(regex.test(str)) {
    alert("Match found!");
} else {
    alert("Match not found.");
}

You can also add the global flag ‘g’ to a regular expression to find all matches in a string:

var regex = /ca[kf]e/g;
var str = "He was eating cake in the cafe.";
var matches = str.match(regex);
alert(matches.length); // Outputs: 2

Tip: Regular expressions are not exclusive to JavaScript, other languages such as Java, Perl, Python, PHP, etc. use the same notation for finding patterns in text.

 

Predefined Character Classes

The character classes like digits, whitespaces, and letters, are used frequently such that there are shortcut names for them. The table below lists those predefined character classes:

ShortcutWhat it Does
.Matches any single character except newline n.
dmatches any digit character. Same as [0-9]
DMatches any non-digit character. Same as [^0-9]
sMatches any whitespace character (space, tab, newline or carriage return character).
Same as [ tnr]
SMatches any non-whitespace character.
Same as [^ tnr]
wMatches any word character (definned as a to z, A to Z,0 to 9, and the underscore).
Same as [a-zA-Z_0-9]
WMatches any non-word character. Same as [^a-zA-Z_0-9]

The examples here will demonstrate to you how to find and replace space with a hyphen (-) character in a string while making use of the regular expression with the JavaScript replace() method:

var regex = /s/g;
var replacement = "-";
var str = "Earth revolves aroundnthetSun";

// Replace spaces, newlines and tabs
document.write(str.replace(regex, replacement) + "<hr>");

// Replace only spaces
document.write(str.replace(/ /g, "-"));

 

Repetition Quantifiers

In the previous section of this tutorial, you have learned how to match a single character in a variety of fashions. Now, what if you want to match on more than one character? For instance, let's consider that you want to find out words containing one or more examples of the letter p, or words containing at least two p's, and so on.

In this situation, quantifiers come into play. Therefore, with quantifiers, you can specify how many times a character in a regular expression should match. However, quantifiers can be applied to the individual characters, and also to classes of characters, and groups of characters contained by the parentheses.

The table lists the various ways you can quantify a particular pattern:

RegExpWhat it Does
p+Matches one or more occurrences of the letter p.
p*Matches zero or more occurrences of the letter p.
p?Matches zero or one occurrences of the letter p.
p{2}Matches exactly two occurrences of the letter p.
p{2,3}Matches at least two occurrences of the letter p, but not more than three occurrences.
p{2,}Matches two or more occurrences of the letter p.
p{,3}Matches at most three occurrences of the letter p

Take a look at the regular expression in the example, as we will split the string as a comma, sequence of commas, whitespace, or a combination thereof using the JavaScript split() method:

var regex = /[s,]+/;
var str = "My favourite colors are red, green and blue";
var parts = str.split(regex);

// Loop through parts array and display substrings
for(var part of parts){
    document.write("<p>" + part + "</p>");
}

 

Position Anchors

Most times certain situation presents themselves where you want to match the beginning or end of a line, word, or string. To achieve this, you can use anchors. Two common anchors exist, they are caret (^) which represents the start of the string, and the dollar ($) sign which represents the end of the string.

RegExpWhat it Does
^pMatches the letter p at the beginning of a line.
p$Matches the letter p at the end of a line.

The example below shows how the regular expression will match only those names in the names array which start with the letter "J" using the JavaScript test() function:

var regex = /^J/;
var names = ["James Bond", "Clark Kent", "John Rambo"];

// Loop through names array and display matched names
for(var name of names) {
    if(regex.test(name)) {
        document.write("<p>" + name + "</p>")
    }
}

 

Pattern Modifiers (Flags)

To control the way a pattern match is handled, the pattern modifier can be used. The Pattern modifiers are usually placed directly after the regular expression, for instance, if you want to search for a pattern in a case-insensitive manner, you can use the I modifier, as shown here:

ModifierWhat it Does
gPerform a global match i.e. finds all occurrences.
iMakes the match case-insensitive manner.
mChanges the behavior of ^ and $ to match against a newline boundary (i.e. start or end of each line within a multiline string), instead of a string boundary.
oEvaluates the expression only once.
sChanges the behavior of . (dot) to match all characters, including newlines.
xAllows you to use whitespace and comments within a regular expression for clarity.

The table below shows lists some of the most commonly used pattern modifiers.

var regex = /color/gi;
var str = "Color red is more visible than color blue in daylight.";
var matches = str.match(regex); // global, case-insensitive match
console.log(matches);
// expected output: ["Color", "color"]

Let’s take a look at this example as it will show us how to use the g and I modifiers in a regular expression to perform a global and case-insensitive search with the JavaScript match() method. also, the example below shows how to match at the beginning of every line in a multi-line string using the ‘^’ anchor and m modifier with the JavaScript match() method.

var regex = /^color/gim;
var str = "Color red is more visible than ncolor blue in daylight.";
var matches = str.match(regex); // global, case-insensitive, multiline match
console.log(matches);
// expected output: ["Color", "color"]

 

Alternation

To specify an alternative version of a pattern, you can use the 'Alternation' as it permits it. In a regular expression, Alternation works just like the OR operator in an if-else conditional statement.

More so, you can specify alternation using a vertical bar (|). For instance, the regexp /fox|dog|cat/ matches the string "fox", or the string "dog", or the string "cat". This is an example: expected output: ["fox", index: 16, ...]

var regex = /fox|dog|cat/;
var str = "The quick brown fox jumps over the lazy dog.";
var matches = str.match(regex);
console.log(matches);
// expected output: ["fox", index: 16, ...]

Note: In JavaScript, alternatives are evaluated from left to right until a match is found. However, when the left alternative matches, the right alternative is ignored completely even if it has a match.

 

Grouping

Parentheses are used to group subexpressions in regular expressions just like mathematical expressions. It allows the application of a repetition quantifier to an entire subexpression.

For instance, in regexp /go+/ the quantifier + is applied only to the last character o and it matches the strings "go", "goo", etc. while, in regexp /(go)+/ the quantifier + is applied to the group of characters g and o, and it matches the strings "go", "gogo", and so on."go", index: 8, ...]

var regex = /(go)+/i; 
var str = "One day Gogo will go to school.";
var matches = str.match(regex); // case-insensitive match
console.log(matches);
// expected output: ["Gogo", "go", index: 8, ...]

Note: The match() method returns an array containing the entire matched string as the first element, followed by any results captured in parentheses, and the index of the whole match, only if the string matches the pattern. However, if no matches were found, it returns null.

Tip: If the regular expression includes the g flag, the match() method only returns an array containing all matched substrings instead than even a matched object, but captured groups, index of the whole match, and other properties are not returned.

 

Word Boundaries

A word boundary character ( b) helps If you want to search for the words that begin and/or end with a pattern, you can use a word boundary character. For instance, the regexp /bcar/ matches the words that start with the pattern car, and as such would match cart, carrot, or cartoon, but would not match oscar.

Same also, the regexp /carb/ matches the words ending with the pattern car, and would match oscar or supercar. However, it would not match cart. Same as the /bcarb/ matches the words starting and ending with the pattern car, and can only match the word car.

The example below will highlight the words beginning with car in bold:

var regex = /(bcarw*)/g;
var str = "Words begining with car: cart, carrot, cartoon. Words ending with car: oscar, supercar.";
var replacement = '<b>$1</b>';
var result = str.replace(regex, replacement);
document.write(result);