Regular Expressions Explained: Patterns, Flags, and Real-World Examples
Regular expressions are a compact language for describing text patterns. Learn metacharacters, quantifiers, groups, lookaheads, and the most useful patterns every developer needs.
A regular expression (regex) is a sequence of characters that defines a search pattern. The pattern can describe anything from a simple literal string to a complex rule like 'a word starting with a capital letter, followed by one or more digits, optionally followed by whitespace and a two-letter country code'. Every major programming language includes a regex engine, and the syntax is largely consistent across JavaScript, Python, Java, Go, and Ruby.
Regex is useful for validation (does this string look like an email address?), extraction (find all URLs in this text), replacement (replace all phone number formats with a canonical form), and splitting (break a CSV line correctly, handling quoted commas). It is not useful for parsing hierarchical structures like HTML or JSON — use a proper parser for those.
Literal Characters and Metacharacters
A regex that consists only of literal characters matches that exact sequence. The pattern cat matches the substring 'cat' anywhere in the string. But regex reserves 12 characters as metacharacters with special meaning: . ^ $ * + ? { } [ ] \ | ( ). To match a literal metacharacter, escape it with a backslash: \. matches a literal period, \$ matches a dollar sign.
The Dot: Match Any Character
The dot . matches any single character except a newline (by default). The pattern c.t matches "cat", "cot", "c8t", "c t", and even "c\tt". It does not match "ct" (nothing in the middle) or "coat" (two characters in the middle). In single-line (dotall) mode — enabled with the s flag in JavaScript — the dot also matches newlines.
Character Classes
A character class defines a set of characters that can match at a single position. Square brackets contain the set: [aeiou] matches any lowercase vowel, [0-9] matches any digit, [a-zA-Z] matches any letter. A caret at the start negates the class: [^0-9] matches any character that is not a digit.
Shorthand character classes provide common sets: \d matches [0-9], \w matches [a-zA-Z0-9_], \s matches [ \t\r\n\f\v]. Their uppercase inverses match the complement: \D matches any non-digit, \W matches any non-word character, \S matches any non-whitespace.
\d = [0-9]
\w = [a-zA-Z0-9_]
\s = [ \t\r\n\f\v]
\D = [^0-9]
\W = [^a-zA-Z0-9_]
\S = [^\s]Anchors
Anchors do not match characters — they match positions. ^ matches the start of the string (or start of a line in multiline mode). $ matches the end of the string (or end of a line). \b matches a word boundary — the position between a word character and a non-word character.
- ^hello — matches 'hello' only at the start of the string
- world$ — matches 'world' only at the end
- ^hello world$ — matches the exact string 'hello world' and nothing else
- \bcat\b — matches the word 'cat' but not 'concatenate' or 'scattered'
Quantifiers
Quantifiers specify how many times the preceding element must match. They always apply to the immediately preceding character, character class, or group.
- * — zero or more times: ab* matches 'a', 'ab', 'abb', 'abbb'
- + — one or more times: ab+ matches 'ab', 'abb', 'abbb' but not 'a'
- ? — zero or one time: colou?r matches both 'color' and 'colour'
- {n} — exactly n times: \d{4} matches exactly four digits
- {n,} — at least n times: \d{2,} matches two or more digits
- {n,m} — between n and m times (inclusive): \d{2,4} matches 2, 3, or 4 digits
Greedy vs Lazy Quantifiers
By default, quantifiers are greedy: they match as much as possible. The pattern <.+> applied to "<b>bold</b>" matches the entire string "<b>bold</b>", not just "<b>" — the engine expands .+ as far as it can and still complete the match. Adding ? after any quantifier makes it lazy (non-greedy): <.+?> matches "<b>" and "</b>" separately.
Groups and Alternation
Parentheses create a capturing group that applies a quantifier to a sequence of characters and captures the matched text for extraction. (\d+) captures one or more digits into group 1. In JavaScript, matched groups are available as match[1], match[2], etc.
Non-capturing groups (?:...) apply grouping and quantifiers without capturing. Use non-capturing groups when you need to group for quantification but do not need the captured value — they are slightly more efficient and do not pollute the match result.
Alternation with | means 'match either the left or right expression'. The pattern cat|dog matches 'cat' or 'dog'. Parentheses control the scope of alternation: (cats?|dogs?) matches 'cat', 'cats', 'dog', or 'dogs'.
Lookahead and Lookbehind
Lookahead and lookbehind are zero-width assertions — they check what comes before or after the current position without consuming characters. They are used to match text only in specific contexts.
- (?=...) — positive lookahead: \d+(?= dollars) matches digits only when followed by ' dollars'
- (?!...) — negative lookahead: \d+(?! dollars) matches digits NOT followed by ' dollars'
- (?<=...) — positive lookbehind: (?<=\$)\d+ matches digits only when preceded by '$'
- (?<!...) — negative lookbehind: (?<!\$)\d+ matches digits NOT preceded by '$'
Flags
Flags modify how the regex engine interprets the pattern. In JavaScript, flags are appended after the closing / of a regex literal, or passed as a string to the RegExp constructor.
- g (global) — find all matches, not just the first. Required for String.replaceAll() behavior with exec().
- i (case-insensitive) — 'Cat' matches /cat/i
- m (multiline) — ^ and $ match start/end of each line, not just the whole string
- s (dotAll) — . matches newlines in addition to all other characters
- u (unicode) — enables full Unicode matching; required for \p{} Unicode property escapes
- d (indices) — populates match.indices with start/end positions of each captured group
10 Patterns Every Developer Should Know
- Email (basic): [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
- URL: https?:\/\/[^\s/$.?#].[^\s]*
- IPv4 address: \b(?:\d{1,3}\.){3}\d{1,3}\b
- UUID: [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
- Hex color: #(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6})\b
- ISO date (YYYY-MM-DD): \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
- US phone number: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
- HTML tag: <([a-zA-Z][a-zA-Z0-9]*)(?:[^>]*)>(.*?)<\/\1>
- Slug (URL-friendly): ^[a-z0-9]+(?:-[a-z0-9]+)*$
- Strong password (8+ chars, upper, lower, digit, special): ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^a-zA-Z\d]).{8,}$
Catastrophic Backtracking
Nested quantifiers with overlapping matches can cause regex engines to run for exponential time on carefully crafted inputs — a vulnerability known as ReDoS (Regular Expression Denial of Service). The classic example is (a+)+ applied to a string like 'aaaaaaaaab'. The engine tries every possible way to group the 'a' characters, leading to exponential combinations.
To avoid catastrophic backtracking: do not nest quantifiers on patterns that can match the same characters ((a+)+, (a*)*), use atomic groups or possessive quantifiers when available, and test your regex with adversarial inputs before deploying in web-facing code.
→Regex Tester — Free Online ToolTest and debug regular expressions with live match highlighting and capture group display.