RegEx 101

Sep 09, 2022

Regular expression or in short Regex is a string of text that lets you create patterns that help match, locate, and manage text. It's an important tool in a wide variety of computing applications, from programming languages like JS, Java and Perl, to text processing tools like grep, sed, and vim.

Here are a few helpers to refresh your mind when you need some 'simple' regex to do the job.

Characters

CharactersLegendExampleSample Match[abc], [a-c]Match the given characters/range of charactersabc[abc]abca, abcb, abcc[^abc], [^a-c]Negate and match the given characters/range of charactersabc[^abc]abcd, abce, abc1.Any character except line breakbc.bca, bcd, bc1, b.\dAny numeric character (equivalent to [0-9])c\dc1, c2, c3 \DAny non-numeric character (equivalent to [^0-9])c\Dca, c., c* \wAny alphanumeric character (equivalent to [A-Za-z0-9_])a\waa, a1, a_ \WAny non-alphanumeric character (equivalent to [A-Za-z0-9_])a\Wa), a$, a? \sUsually used for white space, but can be used for new line, tab, etca\sa\SNot a white space or equivalent like new line, tab, etca\Saa\tMatches a horizontal tabT\tabT ab\rMatches a carriage returnAB\r\nCDAB
CD\nMatches a linefeedAB\r\nCDAB
CD\Escapes special characters\d0, 1x|yMatches either "x" or "y"a|ba, b

Assertions

CharactersLegendExampleSample Match^Start of string or start of line depending on multiline mode^abc.*abc, abd, abcd$End of string or start of line depending on multiline mode.*xyz$xyz, wxyz, abcdxyz\bMatches a word character is not followed by another word-characterMy.*\bpieMy apple pie\BMatches a non-word boundaryc.*\Bcatcopycatx(?=y)Lookahead assertion: Matches "x" only if "x" is followed by "y"\d+(?=€)$1 = 0.98€x(?!y)Negative Lookahead assertion: Matches "x" only if "x" is followed not by "y"\d+\b(?!€)$1 = 0.98€(?<=y)xLookbehind assertion: Matches "x" only if "x" is preceded by "y"(?<=\d)\d$1 = 0.9*8*€(?<!y)xNegative Lookbehind assertion: Matches "x" only if "x" is not preceded by "y"(?<!\d)\d$1 = 0.98€

Groups

CharactersLegendExampleSample Match(x)Capturing group: Matches x and remembers the matchA(nt|pple)Ant (remembers "nt")(?<name>x)Capturing group: Matches x and stores it in the mentioned variableA(?<m>nt|pple)Ant (m = "nt")(?:name>x)Non-capturing group: Matches x and does not remember the matchA(?:nt|pple)Ant\nBack reference to the last substring matching the n parenthetical(\d)+(\d)=\2+\15+6=6+5

Quantifiers

CharactersLegendExampleSample Matchx*Matches the preceding item "x" 0 or more timesa*a, aa, aaax+Matches the preceding item "x" 1 or more times, equivalent to {1,}a+aa, aaa, aaaax?Matches the preceding item "x" 0 or 1 timeab?a, abx{n}Matches the preceding item "x" n times (n = positive integer)ab{5}cabbbbbcx{n,}Matches the preceding item "x" at least n times (n = positive integer)ab{2,}cabbc, abbbc, abbbbcx{n,m}Matches the preceding item "x" at least n times & at most m times (n<m)ab{2,3}cabbc, abbbc

NOTE

By default quantifiers are greedy (they try to match as much of the string as possible).
The ? character after the quantifier makes the quantifier non-greedy (it will stop as soon as it finds a match).

For Example: \d+? for a test string 12345 will match only 1, but \d+ will match the entire string 12345

Flags

Flags are put at the end of the regular expression. They are used to modify how the regular expression behaves.

For Example: /a/ for a test string a will match a only, but adding the flag i (/a/i) would match both a and A

CharactersLegenddGenerate indices for substring matchesgGlobal searchiCase-insensitive searchmMulti-line searchsAllows . to match newline charactersuTreats a pattern as a sequence of Unicode code pointsyPerform a sticky search that matches starting at the current position in the target string

If you wish to test your knowledge:

https://regexr.com
https://regex101.com

Have a good weekend! 👊🏽

Ido’s Substack

Discussion about this post