Regular Expressions (RegEx)
Introduction
Regular expressions (RegEx) are like blueprints for searching and manipulating text. They allow you to find, replace, and validate data with precision. At its core, a regular expression is a sequence of characters and symbols that forms a search pattern.
RegEx is integrated into many tools (like grep and sed) and programming languages.
Basic Syntax and Special Characters
.(dot): Matches any single character except a newline.*(asterisk): Matches zero or more occurrences of the preceding character or pattern.+(plus): Matches one or more occurrences of the preceding character or pattern.?(question mark): Matches zero or one occurrence of the preceding character or pattern.^(caret): Matches the beginning of a line.$(dollar sign): Matches the end of a line.\d: Matches any digit (0-9).\w: Matches any word character (letters, digits, or underscores).\s: Matches any whitespace character (space, tab, or newline).\b: Matches a word boundary (start or end of a word).
Grouping in RegEx
Grouping lets you combine multiple characters or patterns. There are three common types of brackets used:
()(Parentheses): Capturing groupsExample:
(abc)+matches "abc", "abcabc", "abcabcabc".
[](Square brackets): Character classesExample:
[aeiou]matches any vowel.
{}(Curly brackets): QuantifiersExample:
a{2,4}matches "aa", "aaa", or "aaaa".
Other Useful Operators in RegEx
OR (
|): Matches either pattern on the left or right side of|.Example:
cat|dogmatches "cat" or "dog".
Negation (
[^]): Matches any character except those inside brackets.Example:
[^0-9]matches any non-digit character.
Escape Character (
\): Used to escape special characters like.or*.Example:
\.matches a literal period..
Common Use Cases
Validate an email address:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$Extract phone numbers:
\b\d{3}[-.]?\d{3}[-.]?\d{4}\bFind words that start with a capital letter:
\b[A-Z][a-z]*\b
Last updated