Python Regular Expressions

RE Module
1. Functions
2. Flag Constants
3. Regular Expression Objects
4. Match Objects
Pattern Syntax
1. Special Characters
2. Character Classes
3. Set
4. Group
5. Or
6. Wildcard
7. Quantifiers
8. Start / End

RE Module

import re
prog = re.compile(pattern)
result = prog.search(string)

Functions

re.compile(pattern, flags=0) compiles a pattern into a regular expression object and returns it.
re.match(pattern, string, flags=0) matches the pattern at the beginning of the string, and returns a match object or None if it does not match.
re.fullmatch(pattern, string, flags=0) matches the pattern to the whole string, and returns a match object or None if it does not match.
re.search(pattern, string, flags=0) scans through the string for the first occurance of the pattern, and returns a match object or None if no position matches.
re.findall(pattern, string, flags=0) returns all non-overlapping matches of the pattern in the string, as a list of strings or tuples (when the pattern includes multiple groups) in the order found.
re.finditer(pattern, string, flags=0) returns an iterator yielding match objects over all non-overlapping matches in the order found.
re.sub(pattern, repl, string, count=0, flags=0) returns the string obtained by replacing the non-overlapping occurrences of the pattern in the string by the replacement string or function. A non-negative integer can be used to specify the maximum number of pattern occurrences to be replaced.
re.split(pattern, string, maxsplit=0, flags=0) splits the string by the occurrences of the pattern and returns the resultant list. A non-negative integer can be used to specify the number of splits, and the remainder of the string is returned as the final element of the list.

Flag Constants

Flag constants passed as the last argument for most of the manipulation functions:

Syntax	Abbr	Description
`re.IGNORECASE`	`re.I`	Perform case-insensitive matching
`re.MULTILINE`	`re.M`	`^` and `$` match the start and end of each line. By default, `^` only for the first line and `$` for the last line
`re.DOTALL`	`re.S`	`.` matches any character at all. By default, it will match anything except a newline
`re.VERBOSE`	`re.X`	Allow multiline patterns by ignoring whitespace and `#` comments except when in a set

Values can be combined using bitwise OR (the | operator) when passed to the function.

Regular Expression Objects

Compiled regular expression objects support methods:

prog.match(string) matches the pattern at the beginning of the string, and returns a match object or None if it does not match.
prog.fullmatch(string) matches the pattern to the whole string, and returns a match object or None if it does not match.
prog.search(string) scans through the string for the first occurance of the pattern, and returns a match object or None if no position matches.
prog.findall(string) returns all non-overlapping matches of the pattern in the string, as a list of strings or tuples (when the pattern includes multiple groups) in the order found.
prog.finditer(string) returns an iterator yielding match objects over all non-overlapping matches in the order found.
prog.sub(repl, string, count=0) returns the string obtained by replacing the non-overlapping occurrences of the pattern in the string by the replacement string or function. A non-negative integer can be used to specify the maximum number of pattern occurrences to be replaced.
prog.split(string, maxsplit=0) splits the string by the occurrences of the pattern and returns the resultant list. A non-negative integer can be used to specify the number of splits, and the remainder of the string is returned as the final element of the list.

Match Objects

Match objects always have a boolean value of True. To access their contents, use methods:

result.group(1,2,...,N) returns a single string if there is a single argument (the whole matched string if the argument equals 0 or is omitted), and returns a tuple with one item per argument if there are multiple arguments.
result.groups() returns a tuple containing all the subgroups of the match.

Pattern Syntax

Usually for the pattern, a raw string notation (an r prefixes the string) is used to exempt python from escaping backslashes.

Special Characters

Syntax	Description
`\n`	Newline / Linefeed
`\r`	Carriage Return
`\t`	Horizontal Tab
`\v`	Vertical Tab
`\f`	Formfeed
`\0`	Null
`\a`	Bell
`\b`	Backspace character (only inside sets)
`\c`	Control character
`\\`	Backslash

Character Classes

Syntax	Description	ASCII Set Equivalence
`\d`	Matches decimal digits	`[0-9]`
`\D`	Matches any character which is not a decimal digit	`[^0-9]`
`\w`	Matches word characters	`[a-zA-Z0-9_]`
`\W`	Matches any character which is not a word character	`[^a-zA-Z0-9_]`
`\s`	Matches whitespace characters	`[ \t\n\r\f\v]`
`\S`	Matches any character which is not a whitespace character	`[^ \t\n\r\f\v]`

Set

[] is used to indicate a set of characters.

Inside sets,

Characters can be listed individually, or ranges can be indicated by giving two characters and separating them by a -;
If the first character of the set is ^, it means the complement;
To match a literal ], place it as the first character or precede it with a backslash for an escape character;
Others symbols lose special meanings and no need to be escaped, but special characters and character classes are still accepted.

Group

() is used to create and capture a group of characters.

Or

| is used to create a regular expression that will match characters on either side.

Wildcard

. is used to match a character that can be anything except a newline. To match a newline as well, use flag constants.

Quantifiers

{} is used to specifies some repeated occurrences of the preceding character or characters grouped by round brackets.

A single number indicates an exact number of occurrences;
Two numbers seperated by a , indicates a range of numbers of occurrences;
An omitted left bound means a lower bound of zero, and an omitted right bound means an infinite upper bound.

There are also special qualifiers for more convenient usage:

Syntax	Description	Equivalent Curly Bracket Denotation
`?`	zero or one occurrences	`{,1}`
`*`	zero or more occurrences	`{,}`
`+`	one or more occurrences	`{1,}`

By default, qualifiers and quantifiers are all greedy (try to match as much text as possible). ? is used to change their behaviors to a non-greedy fashion when postponed to qualifiers or quantifiers.

Start / End

Syntax	Description
`^`	Matches the start of the line
`$`	Matches the end of the line
`\A`	Matches only at the start of the string
`\Z`	Matches only at the end of the string
`\b`	Matches the empty string, but only at the beginning or end of a word (works as word boundaries)
`\B`	Matches the empty string, but only when it is not at the beginning or end of a word

Examples

Phone and email

import re
phoneRegex = re.compile(r'\+?(\d{1,3})(-| )([- ()\d]+)')
emailRegex = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}')

# Example string content sourced from the National Science Foundation website
phoneFound = phoneRegex.search('Call the Help Desk at 1-800-381-1532')
print('Country code: ' + phoneFound.group(1))
print('Local number: ' + phoneFound.group(3))

String strip

import re
def regex_strip(string, chars=None):
    if chars:
        pattern = r'\A[' + chars + r']*(.*?)[' + chars + r']*\Z'
    else:
        pattern = r'\A\s*(.*?)\s*\Z'
    return re.fullmatch(pattern, string, re.DOTALL).group(1)

References

Sweigart, A. (2015). Automate the Boring Stuff With Python: Practical Programming for Total Beginners. San Francisco, CA: No Starch Press.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Regular Expressions

Table of Contents

RE Module

Functions

Flag Constants

Regular Expression Objects

Match Objects

Pattern Syntax

Special Characters

Character Classes

Set

Group

Or

Wildcard

Quantifiers

Start / End

Examples

References

FilesExpand file tree

py-RegEx.md

Latest commit

History

py-RegEx.md

File metadata and controls

Python Regular Expressions

Table of Contents

RE Module

Functions

Flag Constants

Regular Expression Objects

Match Objects

Pattern Syntax

Special Characters

Character Classes

Set

Group

Or

Wildcard

Quantifiers

Start / End

Examples

References