Skip to content

docs: specify data overlay files for the table format#7381

Draft
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:feat-patch-files
Draft

docs: specify data overlay files for the table format#7381
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:feat-patch-files

Conversation

@wjones127

Copy link
Copy Markdown
Contributor

Adds a specification for data overlay files: small files attached to a fragment that supply new values for a subset of (row offset, field) cells without rewriting the base data files. They make cell-level updates cheap when only a small fraction of rows and/or columns change.

This PR is spec + proto only — no read/write implementation yet.

Changes

  • protos/table.proto
    • Rework DataOverlayFile: a oneof coverage { bytes shared_offset_bitmap | FieldCoverage field_coverage } to support both dense (rectangular) and sparse overlays; add the FieldCoverage message.
    • Rename read_versioncommitted_version (uint64), with effective/commit-stamped semantics so overlay-vs-index ordering is correct.
    • Drop the in-file offset key column in favor of rank-based addressing off the coverage bitmap.
    • Document reader feature flag 64 (and previously-undocumented 16/32).
  • docs/src/format/table/data_overlay_file.md (new): full specification — coverage/resolution, deletion precedence, NULL-override, layout + rank addressing, dense vs. sparse, versioning, field-aware index exclusion with flat re-evaluation, the correctness invariant, both compaction modes, row lineage, a worked example (write → read → index query → sparse write → read → compaction), and a guidance stub with open questions.
  • docs/src/format/table/index.md: concise overview + link to the new spec (replacing the earlier inline sketch).

Out of scope / follow-ups

  • Write transaction shape (new Operation variant in transaction.proto + Rust).
  • Writer support for unequal-length columns (needed for single-file sparse overlays).
  • Coverage bitmap external spill for very large coverage.
  • Per-fragment vs. per-table overlays / LSM analogy (open question in the doc).

🤖 Generated with Claude Code

Add a specification for data overlay files: small files attached to a
fragment that supply new values for a subset of (row offset, field) cells
without rewriting the base data files, for cheap cell-level updates.

- protos/table.proto: rework DataOverlayFile with a dense/sparse coverage
  oneof (shared_offset_bitmap vs new FieldCoverage), rename read_version to
  committed_version (effective, commit-stamped), and document rank-based
  addressing with no offset column. Document reader feature flag 64.
- docs: add data_overlay_file.md (full spec, worked example, guidance stub)
  and link it from the table format overview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the A-format On-disk format: protos and format spec docs label Jun 19, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-format On-disk format: protos and format spec docs documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant