tomllib: quadratic parse time from re-walking a deep table header for every key (related to gh-149231)

## Bug report

### Bug description:

`tomllib.loads` re-traverses the whole table header path from the document root
once for every key/value pair under that header. A document with one table
header of depth `D` followed by `M` sibling keys is parsed in `O(M * D)` time.

`key_value_rule` (`Lib/tomllib/_parser.py`) computes
`abs_key_parent = header + key_parent` for each pair and then calls both
`out.flags.is_(abs_key_parent, Flags.FROZEN)` and
`out.data.get_or_create_nest(abs_key_parent)`. `Flags.is_` and
`get_or_create_nest` each walk the full `abs_key_parent` path from their
respective roots, so the shared `header` prefix is re-walked for every one of
the `M` keys. Nothing anchors the per-key work on the current table section.

Reproducer (single deep header, many single-part keys, all spec-valid):

```python
import tomllib, time
D, M = 998, 80000
doc = "[" + ".".join(f"h{i}" for i in range(D)) + "]\n" + "".join(f"k{i}=1\n" for i in range(M))
t = time.perf_counter(); tomllib.loads(doc); print(len(doc), "bytes", time.perf_counter() - t, "s")
```

Measured (scaling `D = M = L`, so bytes are `O(L)` but time is `O(L^2)`):

```
L(=D=M)    bytes     time
   1000    11.5 KB   0.14 s
   2000    25.2 KB   0.57 s   (4.2x for 2x)
   4000    52.5 KB   2.19 s   (3.9x for 2x)
```

A single 52 KB file blocks the thread for over 2 seconds.

### How this differs from gh-149231

gh-149231 fixed an `O(N^2)` blowup from a single key with `N` dotted parts, by
adding `MAX_KEY_PARTS = sys.getrecursionlimit()` in `parse_key`. This is a
different mechanism: the cost here comes from re-walking a shared header for
every one of `M` keys, and each key in the reproducer has exactly one part, so
the `parse_key` cap never fires on the keys. `MAX_KEY_PARTS` bounds the header
depth `D` to about 1000 but does not remove the per-key re-walk, so a residual
`O(1000 * M)` amplification survives on builds that carry the cap:

```
D=998, M=80000  ->  714 KB  ->  ~11.7 s   (legal even with the MAX_KEY_PARTS cap)
```

### Suggested direction

Resolve the header's flags and nest node once per table section and have
`key_value_rule` walk only the relative `key_parent` (length 0 for the common
single-part key), turning each section from `O(M * D)` into `O(M + D)`.

### CPython versions tested on:

3.12, 3.13, 3.14, 3.15 (main). The full `O(input^2)` reproduces on 3.12 and on
3.14.5 as released (no MAX_KEY_PARTS yet); the `O(1000 * M)` residual reproduces
on 3.13, current 3.14, and main (which carry the gh-149231 cap).

### Operating systems tested on:

macOS (also platform independent, pure Python parser)



### Linked PRs
* gh-152931

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tomllib: quadratic parse time from re-walking a deep table header for every key (related to gh-149231) #152930

Bug report

Bug description:

How this differs from gh-149231

Suggested direction

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

tomllib: quadratic parse time from re-walking a deep table header for every key (related to gh-149231) #152930

Description

Bug report

Bug description:

How this differs from gh-149231

Suggested direction

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions