Was debugging some mark expression parsing and noticed the tokenizer was rejecting expressions that should be valid. Traced it to Tokenizer._iter_tokens in src/_pytest/mark/expression.py line 105:
elif (quote_char := input[pos]) in ("'", '"'):
end_quote_pos = input.find(quote_char, pos + 1)
if end_quote_pos == -1:
raise SyntaxError(...)
value = input[pos : end_quote_pos + 1]
if (backslash_pos := input.find("\\")) != -1: # <- searches entire input
raise SyntaxError(
r'escaping with "\" not supported in marker expression',
(FILE_NAME, 1, backslash_pos + 1, input),
)
The input.find("\\") at line 105 scans the entire expression string, not just the current quoted portion. So if there is a backslash anywhere else in the expression — even in an already-parsed IDENT — it triggers a false SyntaxError during quoted string parsing.
Backslashes are explicitly valid in IDENTs per the regex at line 113 ((:?\w|:|\+|-|\.|\[|\]|\\|/)+) and the existing test test_backslash_not_treated_specially confirms they should be treated as regular identifier characters.
Example:
from _pytest.mark.expression import Expression
# Should work: backslash is in IDENT, not in the quoted string
Expression.compile("some\\marker and 'hello'")
# Raises: SyntaxError: escaping with "\" not supported in marker expression
# (points to backslash at position 4, which is in the already-parsed IDENT, not the "'hello'" quoted string)
Expression.compile("'hello' and some\\marker")
# Same false SyntaxError — backslash at position 12 is in the trailing IDENT
The fix is to search only within the current quoted string:
if "\\" in value: # check only the quoted portion
raise SyntaxError(...)
Been there since the tokenizer was introduced. Not caught by tests because none of the existing test expressions combine a quoted string with a backslash-containing IDENT.
Was debugging some mark expression parsing and noticed the tokenizer was rejecting expressions that should be valid. Traced it to
Tokenizer._iter_tokensinsrc/_pytest/mark/expression.pyline 105:The
input.find("\\")at line 105 scans the entire expression string, not just the current quoted portion. So if there is a backslash anywhere else in the expression — even in an already-parsed IDENT — it triggers a false SyntaxError during quoted string parsing.Backslashes are explicitly valid in IDENTs per the regex at line 113 (
(:?\w|:|\+|-|\.|\[|\]|\\|/)+) and the existing testtest_backslash_not_treated_speciallyconfirms they should be treated as regular identifier characters.Example:
The fix is to search only within the current quoted string:
Been there since the tokenizer was introduced. Not caught by tests because none of the existing test expressions combine a quoted string with a backslash-containing IDENT.