Regular Expressions (RegEx)

Introduction

  • Regular expressions (RegEx) are like blueprints for searching and manipulating text. They allow you to find, replace, and validate data with precision. At its core, a regular expression is a sequence of characters and symbols that forms a search pattern.

  • RegEx is integrated into many tools (like grep and sed) and programming languages.

Basic Syntax and Special Characters

  • . (dot): Matches any single character except a newline.

  • * (asterisk): Matches zero or more occurrences of the preceding character or pattern.

  • + (plus): Matches one or more occurrences of the preceding character or pattern.

  • ? (question mark): Matches zero or one occurrence of the preceding character or pattern.

  • ^ (caret): Matches the beginning of a line.

  • $ (dollar sign): Matches the end of a line.

  • \d: Matches any digit (0-9).

  • \w: Matches any word character (letters, digits, or underscores).

  • \s: Matches any whitespace character (space, tab, or newline).

  • \b: Matches a word boundary (start or end of a word).

Grouping in RegEx

  • Grouping lets you combine multiple characters or patterns. There are three common types of brackets used:

    • () (Parentheses): Capturing groups

      • Example: (abc)+ matches "abc", "abcabc", "abcabcabc".

    • [] (Square brackets): Character classes

      • Example: [aeiou] matches any vowel.

    • {} (Curly brackets): Quantifiers

      • Example: a{2,4} matches "aa", "aaa", or "aaaa".

Other Useful Operators in RegEx

  • OR (|): Matches either pattern on the left or right side of |.

    • Example: cat|dog matches "cat" or "dog".

  • Negation ([^]): Matches any character except those inside brackets.

    • Example: [^0-9] matches any non-digit character.

  • Escape Character (\): Used to escape special characters like . or *.

    • Example: \. matches a literal period ..

Common Use Cases

  • Validate an email address:

    ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
  • Extract phone numbers:

    \b\d{3}[-.]?\d{3}[-.]?\d{4}\b
  • Find words that start with a capital letter:

    \b[A-Z][a-z]*\b

Last updated