Overview of Javascript Regex

Overview of Javascript Regex

Definition/information:

A regular expression is written in the form of /pattern/modifiers; where "pattern" is the regular expression itself, and "modifiers" are a series of characters indicating various options. The "modifiers" part is optional.

Modifiers:

  • "/g" enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one.
  • "/i" makes the regex match not case sensitive.
  • "/s" enables "single-line mode".
  • "/m" enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string.

You can combine multiple modifiers by stringing them together as in /regex/gim, this means it is global, not case sensitive, and multi-lined.

Escaping Slashes:

Since forward slashes quits the regular expression, any forward slashes that appear in the regex need to be escaped. Example, the regex 1/2 is written as /1/2/.

RegEx and Strings:

To test if a certain RegEx matches (part of) a string, you can call the strings's match() method:

if (myString.match(/regex/)) {
/* Yay! */
}

If you want to verify user input, you should use anchors to make sure that you are testing against the entire string. To test if the user entered a number use:

myString.match(/^d+$/)

/d+/ matches any string containing one or more digits, but /^d+$/ matches only strings consisting entirely of digits.
To do a search and replace with RegExes, use the string's replace() method:

myString.replace(/replaceit/g, "replacement")

Using the "/g" modifier makes sure that all occurrences of "replaceit" are replaced. The second parameter is a normal string with the replacement text.

If the regexp contains capturing parentheses, you can use backreferences in the replacement text. "$1" in the replacement text inserts the text matched by the first capturing group, "$2" the second, etc. up to "$9".

Finally, using a string's split() method allows you to split the string into an array of strings using a regular expression to determine the positions at which the string is splitted. EX:

myArray = myString.split(/,/)

^^ Splits a comma-delimited list into an array. The comma's themselves are not included in the resulting array of strings.

RegEx objects:

Each JavaScript execution thread (i.e. each browser window or frame) contains one pre-initialized RegExp object. Usually, you will not use this object directly. The easiest way to create a new RegExp instance is to simply use the special regex syntax:

myregexp = /regex/

If you have the regular expression in a string (ex. because it was typed in by the user), you can use the RegExp constructor:

myregexp = new RegExp(regexstring)

Modifiers can be specified as a second parameter:

myregexp = new RegExp(regexstring, "gims")

I recommend that you do not use the RegExp constructor with a literal string, because in literal strings, backslashes must be escaped. The regular expression w+ can be created as re = /w+/ or as re = new RegExp("\w+"). The latter is definitely harder to read. The regular expression \ matches a single backslash. In JavaScript, this becomes re = /\/ or re = new RegExp("\\").

Whichever way you create "myregexp", you can pass it to the String methods explained above instead of a literal regular expression:

myString.replace(myregexp, "replacement")

If you want to retrieve the part of the string that was matched, call the exec() function of the RegExp object that you created, ex.:

mymatch = myregexp.exec("subject")

This function returns an array. The zeroth item in the array will hold the text that was matched by the regular expression. The following items contain the text matched by the capturing parentheses in the regexp, if any. mymatch.index indicates the character position in the subject string at which the pattern matched.

Calling the exec() function also changes a number of properties of the RegExp object. Note that even though you can create multiple "myregexp" instances, each JavaScript thread of execution only has one global RegExp object. This means that the property values of all the "myregexp" instances will all be the same, and indicate the result of the very last call to exec(). The lastMatch property holds the text matched by the last call to exec(), and lastIndex stores the index in the subject string of the first character in the match. leftContext stores the part of the subject string to the left or the regexp match, and rightContext the part to the right.

RegEx Patterns:

  • ^ ~ Start of string
  • $ ~ End of string
  • . ~ any single character
  • (a|b) ~ a or b
  • (...) ~ group section
  • [abc] ~ item in range (a or b or c)
  • [^abc] ~ Not in range (not a or b or c)
  • a? ~ Zero or One of a
  • a* ~ Zero or more of a
  • a+ ~ One or more of a
  • a{3} ~ exactly 3 of a
  • a{3,} ~ 3 or more of a
  • a{3,6} ~ Between 3 and 6 of a
  • !(pattern) ~ "Not" prefix. Apply rule when URL does not match pattern

Documentaion © to Fire G. All rights reserved. This may not be duplicated or reproduced without the consent of the owner.

Share the Love Share the Love

Add to Reddit Add to StumbleUpon Add to Mixx Add to Delicious Add to designfloat

About the author: Fire G

Hey, I'm the founder of Fire Studios, and thus, have my hands in everything that goes on here at FS. I manage the content, moderate the comments, design everything, code everything, provides a lot of articles, host the official podcast FS-Air, and run/manage most of the other sites in the FI family. Often times I'll come to be working on so many things that I hardly accomplish much, but that's what makes me who I am.

1 Reader Comment

  •  
  • xNephilimx
    September 30th

    For those of you who came here looking for a solution to single-line mode in JavaScript RegEx, notice that there's no such thing as a '/s' modifier in the JavaScript regex implementation. It's just plain not implemented. To work around the dot not matching newline characters you should change the dot for [\s\S] (whitespace and/or not whitespace).

    So:

    having a text like this:

    Maecenas ullamcorper tellus eu metus. Aliquam blandit, tortor eget pellentesque consequat; leo risus malesuada enim, sit amet commodo risus dui vel turpis.

    In tempus elit ut nisl. Sed elementum. Vivamus pretium feugiat purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Quisque et quam.

    the pattern /\(.*?)\/ig won't match, while the pattern /\([\s\S]*?)\/ig will.

Leave a Reply