docs: specify data overlay files for the table format#7381
Draft
wjones127 wants to merge 1 commit into
Draft
Conversation
Add a specification for data overlay files: small files attached to a fragment that supply new values for a subset of (row offset, field) cells without rewriting the base data files, for cheap cell-level updates. - protos/table.proto: rework DataOverlayFile with a dense/sparse coverage oneof (shared_offset_bitmap vs new FieldCoverage), rename read_version to committed_version (effective, commit-stamped), and document rank-based addressing with no offset column. Document reader feature flag 64. - docs: add data_overlay_file.md (full spec, worked example, guidance stub) and link it from the table format overview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
Important This PR touches the Lance format specification. Substantive changes to the format specification — the If this is a meaningful format change:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a specification for data overlay files: small files attached to a fragment that supply new values for a subset of
(row offset, field)cells without rewriting the base data files. They make cell-level updates cheap when only a small fraction of rows and/or columns change.This PR is spec + proto only — no read/write implementation yet.
Changes
protos/table.protoDataOverlayFile: aoneof coverage { bytes shared_offset_bitmap | FieldCoverage field_coverage }to support both dense (rectangular) and sparse overlays; add theFieldCoveragemessage.read_version→committed_version(uint64), with effective/commit-stamped semantics so overlay-vs-index ordering is correct.64(and previously-undocumented16/32).docs/src/format/table/data_overlay_file.md(new): full specification — coverage/resolution, deletion precedence, NULL-override, layout + rank addressing, dense vs. sparse, versioning, field-aware index exclusion with flat re-evaluation, the correctness invariant, both compaction modes, row lineage, a worked example (write → read → index query → sparse write → read → compaction), and a guidance stub with open questions.docs/src/format/table/index.md: concise overview + link to the new spec (replacing the earlier inline sketch).Out of scope / follow-ups
Operationvariant intransaction.proto+ Rust).🤖 Generated with Claude Code