Skip to content

Quadratic time in xml.etree.ElementTree when parsing text with a large number of comments #150096

@StanFromIreland

Description

@StanFromIreland

Bug report

Bug description:

import time
import xml.etree.ElementTree as ET

for N in (5000, 10000, 20000, 40000, 80000):
    data = b"<r>" + b"x<!---->" * N + b"</r>"
    s = time.perf_counter()
    ET.fromstring(data)
    dt = time.perf_counter() - s
    print(f"{N}  {dt}s")

I see:

$ python repro.py 
5000  0.025435873976675794s
10000  0.0888163199997507s
20000  0.3610062320076395s
40000  1.4099939750158228s
80000  5.41402202800964s

Found by OSS-Fuzz.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions