Skip to content

mark expression tokenizer: backslash check in quoted string parsing scans entire input, causing false SyntaxError #14534

@EternalRights

Description

@EternalRights

Was debugging some mark expression parsing and noticed the tokenizer was rejecting expressions that should be valid. Traced it to Tokenizer._iter_tokens in src/_pytest/mark/expression.py line 105:

elif (quote_char := input[pos]) in ("'", '"'):
    end_quote_pos = input.find(quote_char, pos + 1)
    if end_quote_pos == -1:
        raise SyntaxError(...)
    value = input[pos : end_quote_pos + 1]
    if (backslash_pos := input.find("\\")) != -1:   # <- searches entire input
        raise SyntaxError(
            r'escaping with "\" not supported in marker expression',
            (FILE_NAME, 1, backslash_pos + 1, input),
        )

The input.find("\\") at line 105 scans the entire expression string, not just the current quoted portion. So if there is a backslash anywhere else in the expression — even in an already-parsed IDENT — it triggers a false SyntaxError during quoted string parsing.

Backslashes are explicitly valid in IDENTs per the regex at line 113 ((:?\w|:|\+|-|\.|\[|\]|\\|/)+) and the existing test test_backslash_not_treated_specially confirms they should be treated as regular identifier characters.

Example:

from _pytest.mark.expression import Expression

# Should work: backslash is in IDENT, not in the quoted string
Expression.compile("some\\marker and 'hello'")
# Raises: SyntaxError: escaping with "\" not supported in marker expression
# (points to backslash at position 4, which is in the already-parsed IDENT, not the "'hello'" quoted string)

Expression.compile("'hello' and some\\marker")
# Same false SyntaxError — backslash at position 12 is in the trailing IDENT

The fix is to search only within the current quoted string:

if "\\" in value:   # check only the quoted portion
    raise SyntaxError(...)

Been there since the tokenizer was introduced. Not caught by tests because none of the existing test expressions combine a quoted string with a backslash-containing IDENT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions