Skip to content

_log_response / _log_request: XML pretty-printing crashes on non-ASCII content #121

@TiagoMLucio

Description

@TiagoMLucio

Summary

lxml.etree.XMLSyntaxError is raised in httpops.py when logging HTTP responses (or requests) whose XML contains non-ASCII UTF-8 characters (e.g. accented letters in attribute values).

Root Cause

In _log_response (and _log_request), the code converts response.content (bytes) to a printable string via repr(), then feeds it back into ET.fromstring():

rawtext = repr(response.content)[2:-1]   # bytes → escaped str  ("héllo" → "h\\xc3\\xa9llo")
# ...substitutions for \r, \n, \t...
tree = ET.fromstring(rawtext.encode())    # str → bytes with literal backslashes → invalid XML

repr() turns non-ASCII bytes into Python escape sequences (\xc3\xa9). Re-encoding that string does not restore the original UTF-8 bytes — it produces bytes containing literal backslash-x sequences. lxml then fails because those sequences are not valid XML:

lxml.etree.XMLSyntaxError: error parsing attribute name, line 46, column 14

The same pattern exists in _log_request (repr(request.body)[1:-1]).

Affected Code

Suggested Fix

Parse XML from the original bytes (response.content / request.body) instead of the repr()-then-encode() round-trip. Wrap in try/except so malformed content doesn't crash logging:

# _log_response — before
tree = ET.fromstring(rawtext.encode())
ET.indent(tree, space="  ")
rawtext = ET.tostring(tree).decode()

# _log_response — after
try:
    tree = ET.fromstring(response.content)
    ET.indent(tree, space="  ")
    rawtext = ET.tostring(tree, encoding='unicode')
except Exception:
    pass  # keep rawtext as-is
# _log_request — before
tree = ET.fromstring(rawtext)
ET.indent(tree, space="       ")
rawtext = ET.tostring(tree)

# _log_request — after
try:
    body_bytes = request.body if isinstance(request.body, bytes) else request.body.encode('utf-8')
    tree = ET.fromstring(body_bytes)
    ET.indent(tree, space="       ")
    rawtext = ET.tostring(tree, encoding='unicode')
except Exception:
    pass  # keep rawtext as-is

Reproduction

Any OSLC/DNG response containing non-ASCII UTF-8 characters in XML attribute values (e.g. requirement titles with accented characters) will trigger the crash during TRACE-level logging.

Example

content = '<?xml version="1.0"?><root name="café"/>'.encode('utf-8')
rawtext = repr(content)[2:-1]
# rawtext = '<?xml version="1.0"?><root name="caf\\xc3\\xa9"/>'
ET.fromstring(rawtext.encode())  # XMLSyntaxError

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions