Skip to content

fix: thread label-only row-group headers in real docling shapes (0.6.3)#14

Merged
maish merged 3 commits into
mainfrom
fix-label-only-rowgroup-threading
Jun 14, 2026
Merged

fix: thread label-only row-group headers in real docling shapes (0.6.3)#14
maish merged 3 commits into
mainfrom
fix-label-only-rowgroup-threading

Conversation

@maish

@maish maish commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Release 0.6.3 — fixes the comprehensive label-only-row-group bug report (real docling export_to_html shapes).

Bug

Grouped values were dropping their line-item title from row_headers, collapsing every line-item's limits under the section header. Three real shapes failed:

  1. Narrow title → full-width description band → values (2-col schedule, M1). The title's row-group extent terminated at the description band → title dropped.
  2. Multi-col matrix: number-key col + title → description → plan×cover values (M2). The multi-cell title row (10 | Travel delay) was rejected by the exactly-one-cell guard → "Travel delay" dropped.
  3. Two-column Label | Value schedules didn't promote the stub column under a single-row thead.

Fixes

  • Extent through description band: a full-width band immediately under a title is absorbed as a nested header member (title outermost), bounded by the next title.
  • Multi-cell titles: a label-only row is a group header when ≤1 label cell is numeric-only (admits number+title; still rejects Average: | 80.2 | 10.7 | 3.3). Repeating key columns are excluded from the promoted title.
  • Signal D: promote the left column of a 2-column Label | Value table to the row-label/stub even under a single-row thead (scoped to max_cols == 2).

Verification

  • Full suite green (4861 passed); ruff + mypy clean.
  • 4 existing golds improved (2-col relational tables now bind row↔value as one record per line); no value or label text dropped.
  • New fixtures: matrix/label-only-title-then-description-band, matrix/label-only-title-number-key-matrix.

Known limitation (pre-existing, separate — tracked for follow-up)

A <n>. | Group: | (empty) sub-grouped header with a colspan title over a promoted descriptor column still falls back to flat (the spanned cell trips the gate's "rules originate from <td>" invariant). This is a gate interaction present before this change, not the label-only-threading path.

Also includes a small hygiene commit: stop tracking .claude/ local config in this public repo.

🤖 Generated with Claude Code

maish and others added 3 commits June 14, 2026 21:29
Untrack the committed .claude/settings.json and gitignore the whole .claude/
directory so personal Claude Code config (permissions, skills) is never pushed
to this public repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le shapes

Three shapes from real TableItem.export_to_html output still dropped the
line-item title from grouped values' row_headers; 0.6.x only handled the
idealized forms.

- Narrow title -> full-width description band -> values: the title's row-group
  extent now extends THROUGH an immediately-following full-width description
  band (absorbed as a nested header member, not a boundary), so the title
  threads as the outer ancestor instead of terminating the extent and being
  dropped. The absorbed band is bounded by the same extent so it does not leak
  past the next title.
- Multi-cell title rows (leading item number/key + textual title, "10 | Travel
  delay" / "3. | Permanent loss of:"): a label-only row is a group header when
  at most ONE of its label cells is numeric-only. This admits number+title while
  still rejecting a data row whose value columns merely happen to be empty
  ("Average: | 80.2 | 10.7 | 3.3", >=2 numeric) -- replaces the old exactly-one-
  cell guard. A repeating key column (same col+text on the group's value rows)
  is excluded from the promoted title so it is not duplicated in the path.
- Two-column Label|Value schedules: Signal D promotes the left column to the
  row-label/stub even under a single-row thead header, scoped to max_cols == 2
  so multi-column property tables are untouched. Also yields proper
  one-record-per-line output for plain 2-col relational tables
  ("North | Sales: 100") instead of two disconnected "Header: value" lines.

4 existing golds improve (2-col relational tables now bind row<->value); no
value or label text dropped. New fixtures label-only-title-then-description-band
and label-only-title-number-key-matrix. Full suite green; ruff + mypy clean.

Known limitation (pre-existing, separate gate interaction, tracked for
follow-up): a "<n>. | Group: | (empty)" sub-grouped header with a colspan title
over a promoted descriptor column still falls back to flat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maish maish merged commit 6521b09 into main Jun 14, 2026
7 checks passed
@maish maish deleted the fix-label-only-rowgroup-threading branch June 14, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant