|
| 1 | +# Regular Expressions |
| 2 | + |
| 3 | +## Basic Matching |
| 4 | +A regular expression (regex) is a pattern that describes a set of strings. Most Linux tools like **grep**, **sed**, and **awk** support regex for searching and transforming text. A literal string is the simplest pattern — it matches itself. |
| 5 | +```[echo](/man/echo) "hello world" | [grep](/man/grep) "hello"``` |
| 6 | +```[grep](/man/grep) -i "error" /var/log/syslog``` |
| 7 | + |
| 8 | +## Anchors |
| 9 | +Anchors match a position rather than a character. |
| 10 | + |
| 11 | +| Pattern | Description | |
| 12 | +|-----|-------------| |
| 13 | +| **^** | Start of line | |
| 14 | +| **$** | End of line | |
| 15 | +| **\b** | Word boundary | |
| 16 | +| **\B** | Non-word boundary | |
| 17 | + |
| 18 | +```[grep](/man/grep) "^#" config.txt``` |
| 19 | +```[grep](/man/grep) "\.conf$" filelist.txt``` |
| 20 | + |
| 21 | +## Character Classes |
| 22 | +A character class matches one character from a defined set. |
| 23 | + |
| 24 | +| Pattern | Description | |
| 25 | +|-----|-------------| |
| 26 | +| **.** | Any single character (except newline) | |
| 27 | +| **[abc]** | One of a, b, or c | |
| 28 | +| **[^abc]** | Any character except a, b, or c | |
| 29 | +| **[a-z]** | Any lowercase letter | |
| 30 | +| **[A-Z]** | Any uppercase letter | |
| 31 | +| **[0-9]** | Any digit | |
| 32 | +| **[a-zA-Z0-9]** | Any alphanumeric character | |
| 33 | +| **\d** | Any digit (same as [0-9]) | |
| 34 | +| **\D** | Any non-digit | |
| 35 | +| **\w** | Any word character (letter, digit, underscore) | |
| 36 | +| **\W** | Any non-word character | |
| 37 | +| **\s** | Any whitespace (space, tab, newline) | |
| 38 | +| **\S** | Any non-whitespace character | |
| 39 | + |
| 40 | +> **\d**, **\w**, and **\s** are Perl-style shortcuts. They work in `grep -P` and most programming languages but not in basic POSIX regex. |
| 41 | +
|
| 42 | +## POSIX Classes |
| 43 | +POSIX character classes are portable across all Unix tools. They must be used inside brackets: `[[:digit:]]`. |
| 44 | + |
| 45 | +| Class | Description | |
| 46 | +|-----|-------------| |
| 47 | +| **[:alpha:]** | Alphabetic characters | |
| 48 | +| **[:digit:]** | Digits (0-9) | |
| 49 | +| **[:alnum:]** | Alphanumeric characters | |
| 50 | +| **[:space:]** | Whitespace characters | |
| 51 | +| **[:upper:]** | Uppercase letters | |
| 52 | +| **[:lower:]** | Lowercase letters | |
| 53 | +| **[:punct:]** | Punctuation characters | |
| 54 | +| **[:print:]** | Printable characters | |
| 55 | +| **[:blank:]** | Space and tab | |
| 56 | + |
| 57 | +```[grep](/man/grep) "[[:digit:]]" data.txt``` |
| 58 | + |
| 59 | +## Quantifiers |
| 60 | +Quantifiers control how many times the preceding element must appear. |
| 61 | + |
| 62 | +| Pattern | Description | |
| 63 | +|-----|-------------| |
| 64 | +| ***** | Zero or more times | |
| 65 | +| **+** | One or more times | |
| 66 | +| **?** | Zero or one time | |
| 67 | +| **{n}** | Exactly n times | |
| 68 | +| **{n,}** | n or more times | |
| 69 | +| **{n,m}** | Between n and m times | |
| 70 | + |
| 71 | +```[grep](/man/grep) -E "o{2,}" words.txt``` |
| 72 | + |
| 73 | +> In basic regex (BRE), quantifiers `+`, `?`, `{`, and `}` must be escaped with a backslash. Use `grep -E` for extended regex where they work without escaping. |
| 74 | +
|
| 75 | +## Groups and Alternation |
| 76 | +Parentheses create groups for applying quantifiers or capturing matches. The pipe symbol provides alternation. |
| 77 | + |
| 78 | +| Pattern | Description | |
| 79 | +|-----|-------------| |
| 80 | +| **(abc)** | Group — match "abc" as a unit | |
| 81 | +| **a\|b** | Alternation — match a or b | |
| 82 | +| **\1** | Backreference — match the first captured group again | |
| 83 | +| **\2** | Backreference — match the second captured group | |
| 84 | + |
| 85 | +```[echo](/man/echo) "abcabc" | [grep](/man/grep) -E "(abc)\1"``` |
| 86 | +```[grep](/man/grep) -E "cat|dog" animals.txt``` |
| 87 | + |
| 88 | +Backreferences are useful in **sed** for rearranging matched text. |
| 89 | +```[echo](/man/echo) "John Smith" | [sed](/man/sed) -E "s/(.*) (.*)/\2, \1/"``` |
| 90 | + |
| 91 | +## Escaping |
| 92 | +The backslash `\` removes the special meaning of a metacharacter. The special characters that need escaping depend on the regex flavor. |
| 93 | + |
| 94 | +In **extended regex** (ERE), these characters are special: `. * + ? ( ) [ ] { } | ^ $ \` |
| 95 | + |
| 96 | +To match a literal dot, period, or other special character, prefix it with a backslash. |
| 97 | +```[grep](/man/grep) -E "192\.168\.1\.1" hosts.txt``` |
| 98 | + |
| 99 | +## Basic vs Extended Regex |
| 100 | +Linux tools support two main regex flavors. |
| 101 | + |
| 102 | +**Basic Regular Expressions** (BRE) are the default for [grep](/man/grep) and [sed](/man/sed). In BRE, the characters `+`, `?`, `{`, `}`, `(`, `)`, and `|` are treated as literals — you must escape them with `\` to use their special meaning. |
| 103 | + |
| 104 | +**Extended Regular Expressions** (ERE) treat those characters as special by default. Use the **-E** flag to enable ERE. |
| 105 | +```[grep](/man/grep) -E "error|warning" logfile``` |
| 106 | +```[sed](/man/sed) -E "s/[0-9]+/NUM/g" data.txt``` |
| 107 | + |
| 108 | +Perl-compatible regex (**PCRE**) adds features like lookahead, lookbehind, and non-greedy quantifiers. Use `grep -P` where available. |
| 109 | +```[grep](/man/grep) -P "\d{3}-\d{4}" contacts.txt``` |
| 110 | + |
| 111 | +## Common Examples |
| 112 | +Match lines that look like an email address. |
| 113 | +```[grep](/man/grep) -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" contacts.txt``` |
| 114 | + |
| 115 | +Match an IPv4 address. |
| 116 | +```[grep](/man/grep) -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" logfile``` |
| 117 | + |
| 118 | +Remove blank lines from a file. |
| 119 | +```[sed](/man/sed) "/^$/d" file.txt``` |
| 120 | + |
| 121 | +Remove lines starting with a comment character. |
| 122 | +```[sed](/man/sed) "/^#/d" config.txt``` |
| 123 | + |
| 124 | +Extract the third column from whitespace-separated data. |
| 125 | +```[awk](/man/awk) "{print \$3}" data.txt``` |
| 126 | + |
| 127 | +Replace multiple spaces with a single space. |
| 128 | +```[sed](/man/sed) -E "s/ +/ /g" messy.txt``` |
0 commit comments