Skip to content

tomllib: quadratic parse time from re-walking a deep table header for every key (related to gh-149231) #152930

Description

@tonghuaroot

Bug report

Bug description:

tomllib.loads re-traverses the whole table header path from the document root
once for every key/value pair under that header. A document with one table
header of depth D followed by M sibling keys is parsed in O(M * D) time.

key_value_rule (Lib/tomllib/_parser.py) computes
abs_key_parent = header + key_parent for each pair and then calls both
out.flags.is_(abs_key_parent, Flags.FROZEN) and
out.data.get_or_create_nest(abs_key_parent). Flags.is_ and
get_or_create_nest each walk the full abs_key_parent path from their
respective roots, so the shared header prefix is re-walked for every one of
the M keys. Nothing anchors the per-key work on the current table section.

Reproducer (single deep header, many single-part keys, all spec-valid):

import tomllib, time
D, M = 998, 80000
doc = "[" + ".".join(f"h{i}" for i in range(D)) + "]\n" + "".join(f"k{i}=1\n" for i in range(M))
t = time.perf_counter(); tomllib.loads(doc); print(len(doc), "bytes", time.perf_counter() - t, "s")

Measured (scaling D = M = L, so bytes are O(L) but time is O(L^2)):

L(=D=M)    bytes     time
   1000    11.5 KB   0.14 s
   2000    25.2 KB   0.57 s   (4.2x for 2x)
   4000    52.5 KB   2.19 s   (3.9x for 2x)

A single 52 KB file blocks the thread for over 2 seconds.

How this differs from gh-149231

gh-149231 fixed an O(N^2) blowup from a single key with N dotted parts, by
adding MAX_KEY_PARTS = sys.getrecursionlimit() in parse_key. This is a
different mechanism: the cost here comes from re-walking a shared header for
every one of M keys, and each key in the reproducer has exactly one part, so
the parse_key cap never fires on the keys. MAX_KEY_PARTS bounds the header
depth D to about 1000 but does not remove the per-key re-walk, so a residual
O(1000 * M) amplification survives on builds that carry the cap:

D=998, M=80000  ->  714 KB  ->  ~11.7 s   (legal even with the MAX_KEY_PARTS cap)

Suggested direction

Resolve the header's flags and nest node once per table section and have
key_value_rule walk only the relative key_parent (length 0 for the common
single-part key), turning each section from O(M * D) into O(M + D).

CPython versions tested on:

3.12, 3.13, 3.14, 3.15 (main). The full O(input^2) reproduces on 3.12 and on
3.14.5 as released (no MAX_KEY_PARTS yet); the O(1000 * M) residual reproduces
on 3.13, current 3.14, and main (which carry the gh-149231 cap).

Operating systems tested on:

macOS (also platform independent, pure Python parser)

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingThe issue will be closed if no feedback is providedstdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions