Regex Cheatsheet: 20 Patterns Every Developer Should Know

This regex cheatsheet gives you 20 patterns you can copy and paste directly into your JavaScript, Python or PCRE (Perl Compatible Regular Expressions) projects. You will find real patterns for email validation, URL parsing, phone number matching and more, along with the language-specific details most references skip entirely. No abstract syntax tables here.

What Are Regular Expressions and Why Do They Matter

A regular expression (regex) is a sequence of characters that defines a search pattern. Developers use regex syntax to validate input, extract data from strings, replace text and parse structured formats like log files or URLs. One well-crafted pattern can replace dozens of lines of manual string manipulation code.

The core building blocks of any regex are character classes, quantifiers, anchors, groups and assertions. Once you understand these five categories, every pattern in this article becomes readable and modifiable. The sections below cover each category with concrete examples before moving to the 20 patterns you will actually use in production.

Essential Character Classes

Character classes tell the regex engine which characters to match at a given position. They are the foundation of the entire regex syntax.

Built-in Shorthand Classes

Shorthand classes save you from writing long bracket expressions. The most common ones are:

\d matches any digit (0-9), equivalent to [0-9]
\w matches any word character: letters, digits and underscores, equivalent to [A-Za-z0-9_]
\s matches any whitespace character including spaces, tabs and newlines
\D, \W and \S are the negated versions of the above
. matches any character except a newline by default

In JavaScript you write these directly: /\d+/. In Python, you pass them as raw strings: r'\d+'. The difference matters when backslashes get interpreted by the string parser before the regex engine even sees them.

Custom Bracket Expressions

You can build custom character classes with square brackets. [aeiou] matches any single vowel. [^aeiou] matches any character that is not a vowel. Ranges like [a-z] and [A-Z0-9] keep patterns concise. POSIX Classes such as [:alpha:] and [:digit:] are available in some engines like PCRE and Ruby but not in JavaScript's built-in RegExp.

Learn how character classes interact with Unicode flags in modern JavaScript

Quantifiers and Repetition

Quantifiers control how many times a preceding token must appear. Getting quantifiers wrong is the fastest way to write a pattern that matches nothing, or matches far too much.

Greedy vs Lazy Quantifiers

By default, quantifiers are greedy: they match as many characters as possible. Adding a ? after a quantifier makes it lazy, matching as few characters as possible.

For example, given the string <b>bold</b>, the greedy pattern <.+> matches the entire string. The lazy pattern <.+?> matches only <b>. This distinction is critical when parsing HTML fragments or JSON snippets.

The standard quantifiers are:

* matches 0 or more occurrences
+ matches 1 or more occurrences
? matches 0 or 1 occurrence (also makes other quantifiers lazy)
{n} matches exactly n occurrences
{n,m} matches between n and m occurrences

Anchors and Boundaries

Anchors do not match characters. They match positions within a string. Two anchors appear in almost every validation pattern.

^ asserts position at the start of a string. $ asserts position at the end. Together, ^pattern$ ensures the entire string must match the pattern, not just a substring. This is the difference between a pattern that validates an email address and one that merely finds an email-like fragment inside a longer string.

\b is the word boundary anchor. It matches the position between a word character and a non-word character. The pattern \bcat\b matches the word "cat" but not "concatenate". Use \B for the inverse: positions that are not word boundaries.

In multiline mode (flag m in JavaScript, re.MULTILINE in Python), ^ and $ match at the start and end of each line rather than the entire string. This changes behavior significantly when you process log files or multi-line text blocks.

Groups and Backreferences

Groups let you treat multiple tokens as a single unit and capture matched text for later use. They are the most powerful feature in any regex tutorial because they allow both structure and reuse.

Capturing Groups

Parentheses () create a capturing group. After a match, you can retrieve the captured text by index. In JavaScript: match[1] gives you the first group. In Python: match.group(1) does the same. Named groups make code more readable: (?P<year>\d{4}) in Python or (?<year>\d{4}) in JavaScript (ES2018 and later).

Non-Capturing Groups and Backreferences

When you need grouping without capturing, use (?:). This avoids polluting your match array with groups you do not need, which also has a small performance benefit.

A backreference lets you match the same text again later in the pattern. (\w+)\s\1 matches repeated words like "the the". The \1 refers back to whatever group 1 captured. Named backreferences use \k<name> in JavaScript and (?P=name) in Python.

See how named groups simplify date extraction from log files

Lookarounds and Assertions

Lookarounds match a position based on what comes before or after it, without consuming characters. They belong to the broader category of zero-width assertions, which also includes anchors.

There are four types: lookahead (?=), negative lookahead (?!), lookbehind (?<=) and negative lookbehind (?<!). A practical example: \d+(?= dollars) matches a number only when followed by the word " dollars", but the matched result excludes " dollars" itself.

Lookbehind support varies by engine. JavaScript added lookbehind in ES2018. Python's re module supports fixed-width lookbehind but not variable-width. PCRE supports variable-width lookbehind with some restrictions. Always verify which engine version your runtime uses before relying on advanced lookaround syntax.

20 Copy-Pasteable Validation Patterns

The patterns below are the practical core of this regex cheatsheet. Each one is tested and ready to use. Language-specific versions are noted where syntax differs.

Input Validation Patterns

Email (basic): ^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$
URL (http/https): ^https?:\/\/([\w-]+\.)+[\w-]+(\/[\w-./?%&=]*)?$
IPv4 address: ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
US phone number: ^\+?1?\s?$?\d{3}$?[\s.-]?\d{3}[\s.-]?\d{4}$
Postal/ZIP code (US): ^\d{5}(-\d{4})?$
Strong password (8+ chars, upper, lower, digit, special): ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
ISO 8601 date (YYYY-MM-DD): ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Credit card number (Visa/MC/Amex): ^4\d{12}(\d{3})?$ (Visa), ^5[1-5]\d{14}$ (Mastercard)

Extraction and Parsing Patterns

HTML tag content: <(\w+)[^>]*>(.*?)<\/\1>
Hex color code: #([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})
Extract domain from URL: https?:\/\/(?:www\.)?([^\/\s]+)
Slug (URL-safe string): ^[a-z0-9]+(?:-[a-z0-9]+)*$
Semantic version (semver): ^(\d+)\.(\d+)\.(\d+)$
GitHub username: ^[a-zA-Z\d](?:[a-zA-Z\d-]{0,37}[a-zA-Z\d])?$
Twitter/X handle: ^@?[A-Za-z0-9_]{1,15}$
UUID v4: ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
JWT token structure: ^[A-Za-z0-9-_]+\.[A-Za-z0-9-_]+\.[A-Za-z0-9-_]*$
File extension: \.([a-zA-Z0-9]+)$
Duplicate words: \b(\w+)\s+\1\b
Whitespace normalization: \s{2,} (replace with single space)

Language-Specific Syntax Differences

Most regex cheatsheet resources treat all engines as identical. They are not. JavaScript, Python and PCRE each have quirks that will break patterns copied between them without adjustment.

In JavaScript, you write regex literals as /pattern/flags or use new RegExp('pattern', 'flags'). The available flags include g (global), i (case-insensitive), m (multiline), s (dotAll, added in ES2018) and u (Unicode). The u flag is important for matching emoji and non-ASCII characters correctly with \w and \d.

Python uses the re module. Always prefix string patterns with r to create raw strings: r'\d+'. Python's re.fullmatch() anchors the pattern to the full string, making the ^ and $ anchors redundant in most validation scenarios. Python 3.11 introduced the re.NOFLAG constant and improved error messages significantly.

PCRE (used in PHP, Nginx, Apache and many other tools) supports atomic groups (?>...) and possessive quantifiers like ++ and *+. These prevent catastrophic backtracking, a key performance concern covered in the next section.

Compare Python re module flags against JavaScript regex flags in detail

Testing and Performance Considerations

Testing your patterns before deploying them is non-negotiable. Regex101 (regex101.com) is the best free tool available: it shows match groups, provides a match information panel, explains each token and supports JavaScript, Python and PCRE flavors side by side. RexEgg also offers detailed documentation on engine-specific behavior.

Performance becomes a real issue with complex patterns. Catastrophic backtracking occurs when a regex engine explores an exponential number of paths trying to find a match. The classic example is (a+)+ against a string of "a" characters followed by a non-matching character. The engine spends enormous time backtracking. Solutions include atomic groups in PCRE, possessive quantifiers and restructuring patterns to reduce ambiguity.

As a general rule: keep patterns as specific as possible. Use \d{4} instead of \d+ when you know you need exactly 4 digits. Avoid nested quantifiers on overlapping character classes. MDN Web Docs and RexEgg both have dedicated sections on performance that are worth reading if you run regex against large datasets.

Frequently Asked Questions

What is the difference between a greedy and a lazy quantifier?

A greedy quantifier like + or * matches as many characters as possible while still allowing the overall pattern to succeed. A lazy quantifier (formed by adding ? after the quantifier, such as +? or *?) matches as few characters as possible. Use lazy quantifiers when extracting content between delimiters such as HTML tags or quoted strings.

Does regex syntax differ between JavaScript and Python?

Yes, meaningfully so. Python uses raw string literals r'...' to avoid double-escaping backslashes, supports named groups with (?P<name>) syntax and has re.fullmatch() for whole-string matching. JavaScript writes regex as literals /pattern/, uses (?<name>) for named groups (ES2018+) and requires the u flag for proper Unicode handling. PCRE adds possessive quantifiers and atomic groups that neither JavaScript nor Python's standard re module support.

How do I test a regex pattern without running code?

Regex101 is the most widely recommended tool. It supports JavaScript, Python (re and regex

Try the ToolsVela tools mentioned in this guide

All of these run in your browser — no signup, no uploads, completely free.

Regex Tester — Test your regex patterns in real time with capture groups.
JSON Formatter — Validate JSON structures with regex-based rules.
Hash Generator — Extract hash patterns using regex.
Markdown to PDF Converter — Save your cheatsheet as a printable PDF.

Browse all 4 free tools →