diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index 7486a378dbb06f..84ec535ca98e97 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -1,7 +1,7 @@ .. _regex-howto: **************************** - Regular Expression HOWTO + Regular expression HOWTO **************************** :Author: A.M. Kuchling @@ -47,7 +47,7 @@ Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable. -Simple Patterns +Simple patterns =============== We'll start by learning about the simplest possible regular expressions. Since @@ -59,7 +59,7 @@ expressions (deterministic and non-deterministic finite automata), you can refer to almost any textbook on writing compilers. -Matching Characters +Matching characters ------------------- Most letters and characters will simply match themselves. For example, the @@ -159,7 +159,7 @@ match even a newline. ``.`` is often used where you want to match "any character". -Repeating Things +Repeating things ---------------- Being able to match varying sets of characters is the first thing regular @@ -210,7 +210,7 @@ this RE against the string ``'abcbd'``. | | | ``[bcd]*`` is only matching | | | | ``bc``. | +------+-----------+---------------------------------+ -| 6 | ``abcb`` | Try ``b`` again. This time | +| 7 | ``abcb`` | Try ``b`` again. This time | | | | the character at the | | | | current position is ``'b'``, so | | | | it succeeds. | @@ -255,7 +255,7 @@ is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use to read. -Using Regular Expressions +Using regular expressions ========================= Now that we've looked at some simple regular expressions, how do we actually use @@ -264,7 +264,7 @@ expression engine, allowing you to compile REs into objects and then perform matches with them. -Compiling Regular Expressions +Compiling regular expressions ----------------------------- Regular expressions are compiled into pattern objects, which have @@ -295,7 +295,7 @@ disadvantage which is the topic of the next section. .. _the-backslash-plague: -The Backslash Plague +The backslash plague -------------------- As stated earlier, regular expressions use the backslash character (``'\'``) to @@ -335,7 +335,7 @@ expressions will often be written in Python code using this raw string notation. In addition, special escape sequences that are valid in regular expressions, but not valid as Python string literals, now result in a -:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`, +:exc:`SyntaxWarning` and will eventually become a :exc:`SyntaxError`, which means the sequences will be invalid if raw string notation or escaping the backslashes isn't used. @@ -351,7 +351,7 @@ the backslashes isn't used. +-------------------+------------------+ -Performing Matches +Performing matches ------------------ Once you have an object representing a compiled regular expression, what do you @@ -369,10 +369,10 @@ for a complete listing. | | location where this RE matches. | +------------------+-----------------------------------------------+ | ``findall()`` | Find all substrings where the RE matches, and | -| | returns them as a list. | +| | return them as a list. | +------------------+-----------------------------------------------+ | ``finditer()`` | Find all substrings where the RE matches, and | -| | returns them as an :term:`iterator`. | +| | return them as an :term:`iterator`. | +------------------+-----------------------------------------------+ :meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If @@ -473,7 +473,7 @@ Two pattern methods return all of the matches for a pattern. The ``r`` prefix, making the literal a raw string literal, is needed in this example because escape sequences in a normal "cooked" string literal that are not recognized by Python, as opposed to regular expressions, now result in a -:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See +:exc:`SyntaxWarning` and will eventually become a :exc:`SyntaxError`. See :ref:`the-backslash-plague`. :meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the @@ -491,7 +491,7 @@ result. The :meth:`~re.Pattern.finditer` method returns a sequence of (29, 31) -Module-Level Functions +Module-level functions ---------------------- You don't have to create a pattern object and call its methods; the @@ -518,7 +518,7 @@ Outside of loops, there's not much difference thanks to the internal cache. -Compilation Flags +Compilation flags ----------------- .. currentmodule:: re @@ -642,7 +642,7 @@ of each one. whitespace is in a character class or preceded by an unescaped backslash; this lets you organize and indent the RE more clearly. This flag also lets you put comments within a RE that will be ignored by the engine; comments are marked by - a ``'#'`` that's neither in a character class or preceded by an unescaped + a ``'#'`` that's neither in a character class nor preceded by an unescaped backslash. For example, here's a RE that uses :const:`re.VERBOSE`; see how much easier it @@ -669,7 +669,7 @@ of each one. to understand than the version using :const:`re.VERBOSE`. -More Pattern Power +More pattern power ================== So far we've only covered a part of the features of regular expressions. In @@ -679,7 +679,7 @@ retrieve portions of the text that was matched. .. _more-metacharacters: -More Metacharacters +More metacharacters ------------------- There are some metacharacters that we haven't covered yet. Most of them will be @@ -875,7 +875,7 @@ Backreferences like this aren't often useful for just searching through a string find out that they're *very* useful when performing string substitutions. -Non-capturing and Named Groups +Non-capturing and named groups ------------------------------ Elaborate REs may use many groups, both to capture substrings of interest, and @@ -979,7 +979,7 @@ current point. The regular expression for finding doubled words, 'the the' -Lookahead Assertions +Lookahead assertions -------------------- Another zero-width assertion is the lookahead assertion. Lookahead assertions @@ -1061,7 +1061,7 @@ end in either ``bat`` or ``exe``: ``.*[.](?!bat$|exe$)[^.]*$`` -Modifying Strings +Modifying strings ================= Up to this point, we've simply performed searches against a static string. @@ -1083,7 +1083,7 @@ using the following pattern methods: +------------------+-----------------------------------------------+ -Splitting Strings +Splitting strings ----------------- The :meth:`~re.Pattern.split` method of a pattern splits a string apart @@ -1137,7 +1137,7 @@ argument, but is otherwise the same. :: ['Words', 'words, words.'] -Search and Replace +Search and replace ------------------ Another common task is to find all the matches for a pattern, and replace them @@ -1236,7 +1236,7 @@ pattern object as the first parameter, or use embedded modifiers in the pattern string, e.g. ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``. -Common Problems +Common problems =============== Regular expressions are a powerful tool for some applications, but in some ways @@ -1244,7 +1244,7 @@ their behaviour isn't intuitive and at times they don't behave the way you may expect them to. This section will point out some of the most common pitfalls. -Use String Methods +Use string methods ------------------ Sometimes using the :mod:`re` module is a mistake. If you're matching a fixed @@ -1310,7 +1310,7 @@ string and then backtracking to find a match for the rest of the RE. Use :func:`re.search` instead. -Greedy versus Non-Greedy +Greedy versus non-greedy ------------------------ When repeating a regular expression, as in ``a*``, the resulting action is to @@ -1388,9 +1388,9 @@ Feedback ======== Regular expressions are a complicated topic. Did this document help you -understand them? Were there parts that were unclear, or Problems you +understand them? Were there parts that were unclear, or problems you encountered that weren't covered here? If so, please send suggestions for -improvements to the author. +improvements to the :ref:`issue tracker `. The most complete book on regular expressions is almost certainly Jeffrey Friedl's Mastering Regular Expressions, published by O'Reilly. Unfortunately,