diff --git a/docs/src/format/table/.pages b/docs/src/format/table/.pages
index 16c20058608..5b0cb0e95e6 100644
--- a/docs/src/format/table/.pages
+++ b/docs/src/format/table/.pages
@@ -6,4 +6,5 @@ nav:
- Layout: layout.md
- Branch & Tag: branch_tag.md
- Row ID & Lineage: row_id_lineage.md
+ - Data Overlay Files: data_overlay_file.md
- MemTable & WAL: mem_wal.md
diff --git a/docs/src/format/table/data_overlay_file.md b/docs/src/format/table/data_overlay_file.md
new file mode 100644
index 00000000000..5540f860018
--- /dev/null
+++ b/docs/src/format/table/data_overlay_file.md
@@ -0,0 +1,390 @@
+# Data Overlay Files
+
+!!! note "Overlay files require feature flag 64 (data overlay files)"
+
+ A reader or writer that does not understand overlay files must refuse a
+ dataset that uses them. Silently ignoring an overlay would return stale base
+ values, which is a correctness bug rather than a degraded experience.
+
+Overlay files supply new values for a subset of `(row offset, field)` cells
+within a fragment **without rewriting the fragment's base data files**. They make
+updates cheap when only a small fraction of rows and/or columns change: instead
+of rewriting whole columns or moving rows to a new fragment, a writer appends a
+small file carrying just the changed cells.
+
+This is Lance's third mechanism for changing data in place, alongside
+[deletion files](index.md#deletion-files) (which remove rows) and
+[data evolution](index.md#data-evolution) (which adds or rewrites whole columns).
+An overlay changes individual cells.
+
+## Concepts
+
+### Coverage and resolution
+
+Each overlay declares which cells it provides through a **coverage** bitmap (or,
+for sparse overlays, one bitmap per field). The bitmaps index **physical row
+offsets** — positions in the base data files, counting deleted rows — so they are
+stable across deletions, exactly like deletion vectors.
+
+To resolve a cell `(offset, field)` on read, walk the fragment's overlays from
+**newest to oldest**. The first overlay that covers `(offset, field)` wins; its
+value is used. If no overlay covers the cell, the value falls through to the base
+data file (or is `NULL` if no base data file holds that field).
+
+Precedence among overlays is determined by:
+
+1. `committed_version` — higher wins (see [Versioning](#versioning-and-ordering)).
+2. Position in `DataFragment.overlays` as a tiebreaker — a later entry is newer.
+
+A covered offset whose value is `NULL` overrides the cell **to** `NULL`. This is
+distinct from an offset that is simply absent from the bitmap, which falls
+through to the base. Coverage, not value-nullness, decides whether an overlay
+applies.
+
+### Interaction with deletions
+
+Deletions take precedence over overlays. If a row offset is marked deleted in the
+fragment's deletion file, any overlay value for that offset is dead and is
+ignored, regardless of commit order. This keeps the invariant simple: a deletion
+is the final word on a row, so a concurrent overlay against a row that was
+deleted needs no special conflict handling — its values are merely inert.
+
+### Physical layout
+
+An overlay's data file stores **one value column per field**, in the order of
+`data_file.fields`. It does **not** store a row-offset key column. The position of
+a covered offset's value within its column is the **rank** of that offset in the
+field's coverage bitmap — the number of set bits below it. For a Roaring bitmap
+this is an O(1) operation, so random access to any cell is a rank computation
+followed by a single value fetch, with no offset column to read and no binary
+search.
+
+Because different fields may cover different offset sets, the value columns of a
+single sparse overlay may have **different lengths**. The Lance file format
+permits columns of differing item counts within one file, so a sparse overlay is
+representable as a single file. (See [Writer support](#writer-support) for the
+current implementation status.)
+
+### Dense vs. sparse overlays
+
+A single overlay is one of two shapes:
+
+- **Dense (rectangular).** One `shared_offset_bitmap` applies to every field. Every
+ covered offset has a value for every field. This is the common case for a plain
+ `UPDATE`, where one `SET` list is applied to one set of rows.
+- **Sparse.** A `FieldCoverage` carries one bitmap per field, used when different
+ fields cover different offset sets — for example a `MERGE` with multiple
+ `WHEN MATCHED` branches, where different rows update different columns. A dense
+ overlay would have to widen to the bounding rectangle and fill the untouched
+ cells with their current values (post-images), which for wide columns such as
+ embeddings means re-storing data that did not change. A sparse overlay stores
+ exactly the changed cells.
+
+A writer may always express a non-rectangular update as **multiple dense overlays
+in one transaction** (one per coverage group) instead of a single sparse overlay.
+
+## Protobuf
+
+
+DataOverlayFile protobuf message
+
+```protobuf
+%%% proto.message.DataOverlayFile %%%
+```
+
+
+
+
+FieldCoverage protobuf message
+
+```protobuf
+%%% proto.message.FieldCoverage %%%
+```
+
+
+
+## Versioning and ordering
+
+Overlays reuse the dataset version as their ordering clock rather than
+introducing a separate generation counter.
+
+`committed_version` is the dataset version at which an overlay **became
+effective** — the version of the commit that introduced it, **not** the version
+it was read from. It is stamped at commit time and re-stamped if the commit is
+retried, in the same way as the created-at / last-updated-at version sequences.
+
+This single value drives every ordering decision:
+
+- **Overlay vs. overlay** (read precedence): higher `committed_version` wins.
+- **Overlay vs. index** (query correctness): an index records the
+ `dataset_version` it was built from. An index whose `dataset_version >=
+ committed_version` already incorporates the overlay. An overlay whose
+ `committed_version > index.dataset_version` is newer than the index and its
+ cells must be excluded from index results and re-evaluated.
+- **Scheduler signal**: the gap between an overlay's `committed_version` and an
+ index's `dataset_version`, or between an overlay and the base, is a staleness
+ measure the compaction scheduler can use.
+
+!!! note "Why effective version, not read version"
+
+ Suppose an overlay reads version 5 and commits at version 6, while an index
+ is built reading version 5 (before the overlay) and commits at version 7 with
+ `dataset_version = 5`. If the overlay stored its *read* version (5), the test
+ `5 > 5` is false, the row would not be excluded, and the index — which never
+ saw the overlay — would return a stale result. Storing the *effective*
+ version (6) makes `6 > 5` true, the cell is excluded and re-evaluated, and the
+ result is correct.
+
+## Index integration
+
+Building an index over a fragment that has overlays does **not** require dropping
+the fragment from the index's coverage. The fragment stays indexed, and the query
+path reconciles overlays at query time using an **exclusion set**.
+
+The exclusion set for an index on field `F` is the union of the coverage bitmaps,
+restricted to field `F`, of every overlay whose `committed_version >
+index.dataset_version`. The exclusion is **field-aware**: an overlay that touches
+only unrelated columns does not exclude anything from the index on `F`.
+
+The query then proceeds as:
+
+1. Run the index search as usual, producing candidate rows.
+2. Remove any candidate in the exclusion set. (Its indexed value may be stale.)
+3. **Re-evaluate** the excluded rows against their current values — the same flat
+ path already used for the unindexed tail of fragments. For a scalar predicate
+ this re-applies the filter; for a vector query it re-scores the row's current
+ vector. Rows that still match are added back to the result.
+
+Step 3 is what makes exclusion correct rather than merely safe: removing a row
+from index candidates without re-evaluating it would silently drop a row that
+should match under its new value.
+
+### Correctness invariant
+
+> For every indexed field `F` and every row offset `o` in a fragment the index
+> covers, the index's entry for `(o, F)` is trusted unless `o` is excluded.
+> `o` is excluded iff some overlay with `committed_version > index.dataset_version`
+> covers `(o, F)`.
+
+The write and compaction paths together preserve this:
+
+- **Writes** change a cell only by adding an overlay, and that overlay's
+ `committed_version` exceeds the version of any pre-existing index — so the
+ change is always covered by an exclusion.
+- **Compaction** may remove an overlay only if the index no longer relies on it
+ (see below).
+
+## Compaction
+
+Overlays accumulate read cost — every overlay is a bitmap to test and a possible
+file to open. Compaction bounds that cost in two modes:
+
+- **Overlay → overlay.** Merge several overlays into fewer, computing the
+ post-image per `(offset, field)` by walking the merged overlays newest-first.
+ The merged overlay takes the **maximum** `committed_version` of its inputs, so
+ the exclusion semantics are preserved. Indexes are unaffected. This is cheap and
+ does not touch the base.
+- **Overlay → base.** Fold overlays into a fresh base data file, computing the
+ post-image for every covered cell, then clear the overlays. The base is
+ complete, so every post-image is well defined. Overlay offsets are physical, so
+ they cannot survive a rewrite that reorders rows; folding therefore materializes
+ values rather than carrying overlays forward.
+
+!!! warning "Folding an indexed field must update its index"
+
+ An overlay→base fold removes the overlay, which removes the exclusion signal
+ that kept an index correct. Folding an overlay that covers an indexed field
+ `F` is therefore equivalent to a column rewrite of `F` and must, in the same
+ commit, either rebuild the index to a `dataset_version` at least the folded
+ overlay's `committed_version`, or remove the fragment from the index's
+ coverage so the rows fall to the flat path. Otherwise the index would serve
+ stale values with no overlay to exclude them. This is the same rule that
+ already governs rewriting a column that an index is built on.
+
+When a fragment with overlays is compacted by a row-rewriting operation
+(`RewriteRows`, which produces new fragments with new row addresses), the
+overlays are folded into the new base as part of the rewrite, and existing
+[fragment-reuse remapping](row_id_lineage.md) handles the row-address changes as
+it does today.
+
+## Row lineage
+
+An overlay write updates the `last_updated_at_version` of every covered row, so
+change-data-feed and time-travel queries observe the update. Because overlays are
+addressed by physical offset, they do **not** require stable row IDs to be
+enabled; lineage updates apply only when those features are on.
+
+## Worked example
+
+A table `users` with stable row IDs enabled and these fields:
+
+| field id | name | type |
+|----------|-----------|-------------------------|
+| 1 | id | `int32` (primary key) |
+| 2 | name | `utf8` |
+| 3 | age | `int32` |
+| 4 | embedding | `fixed_size_list`|
+
+Created at version 1 as a single fragment `0` with one base data file
+`data/file0.lance` holding all four columns. `physical_rows = 4`:
+
+| offset | id | name | age | embedding |
+|--------|----|-------|-----|------------------|
+| 0 | 1 | Alice | 30 | … |
+| 1 | 2 | Bob | 25 | … |
+| 2 | 3 | Carol | 40 | … |
+| 3 | 4 | Dave | 22 | … |
+
+A BTree scalar index on `age` is built at version 1, covering fragment `0`
+(`dataset_version = 1`).
+
+### Step 1 — write an overlay
+
+```sql
+UPDATE users SET age = 26 WHERE id = 2; -- Bob, offset 1
+```
+
+This touches one field (`age`) for one row, so the writer emits a dense overlay
+and commits it as version 2. Fragment `0` gains:
+
+```text
+DataOverlayFile {
+ data_file: { path: "data/overlay-.lance", fields: [3], column_indices: [0] }
+ coverage: shared_offset_bitmap = {1}
+ committed_version: 2
+}
+```
+
+The overlay file stores a single `age` column with one value, `[26]`, at
+rank `{1}.rank(1) = 0`. `last_updated_at_version[1]` is set to 2.
+
+### Step 2 — read
+
+`SELECT id, age FROM users` reads base ages `[30, 25, 40, 22]`. For `age`
+(field 3), the overlay covers offset 1, so `age[1]` is replaced with the overlay
+value at position `{1}.rank(1) = 0` → `26`. Result ages: `[30, 26, 40, 22]`.
+
+### Step 3 — index query
+
+```sql
+SELECT * FROM users WHERE age = 26;
+```
+
+The `age` index was built at `dataset_version = 1`; the overlay's
+`committed_version` is 2. Since `2 > 1`, the overlay's coverage for `age`, `{1}`,
+is the exclusion set for this query.
+
+- The index (built at v1) holds Bob's *old* `age = 25`, so a lookup for `26`
+ returns nothing from the index.
+- Offset 1 is in the exclusion set, so it is re-evaluated on the flat path. Its
+ current `age` (26, via the overlay) matches `age = 26`, so Bob is returned.
+
+The mirror case `WHERE age = 25` shows exclusion preventing a stale hit: the index
+returns offset 1 (stale `25`), but offset 1 is excluded, re-evaluated to `26`, and
+correctly dropped.
+
+### Step 4 — a second, non-rectangular write
+
+```sql
+MERGE INTO users USING staged ON users.id = staged.id
+WHEN MATCHED AND staged.kind = 'rename' THEN UPDATE SET name = staged.name -- Carol(2), Dave(3)
+WHEN MATCHED AND staged.kind = 'revec' THEN UPDATE SET embedding = staged.embedding -- Bob(1)
+```
+
+`name` is updated for offsets `{2, 3}` and `embedding` for offset `{1}` — different
+fields over different rows. This is a sparse overlay, committed as version 3:
+
+```text
+DataOverlayFile {
+ data_file: { path: "data/overlay-.lance", fields: [2, 4], column_indices: [0, 1] }
+ coverage: field_coverage { offset_bitmaps: [ {2,3}, {1} ] }
+ // name (field 2) ^ ^ embedding (field 4)
+ committed_version: 3
+}
+```
+
+The file's `name` column has **two** values (`["Caroline", "David"]`, at
+ranks 0 and 1 of `{2,3}`) and its `embedding` column has **one** value (at rank 0
+of `{1}`) — columns of different lengths in one file.
+
+### Step 5 — read after the second write
+
+`SELECT name, age, embedding FROM users` resolves each field independently,
+newest overlay first:
+
+- `name`: the v3 overlay covers `{2,3}` → `["Alice", "Bob", "Caroline", "David"]`.
+- `age`: the v3 overlay does not cover `age`; the v2 overlay still applies at
+ offset 1 → `[30, 26, 40, 22]`.
+- `embedding`: the v3 overlay covers `{1}` → Bob's vector is the new one, others
+ from base.
+
+Overlays from different versions coexist and apply per field.
+
+### Step 6 — compaction (overlay → base)
+
+The scheduler folds both overlays into fragment `0` at version 4, computing
+post-images for `age`, `name`, and `embedding`, and writing a new base data file
+`data/file1.lance` with those columns. In the old file, fields 2, 3, and 4 are
+tombstoned (`-2`); field 1 (`id`) remains. The fragment's `overlays` list is
+cleared. Row addresses are preserved (a column rewrite, not a row rewrite), so
+stable row IDs and the deletion vector are untouched.
+
+Because the fold removed the overlay that was excluding offset 1 from the `age`
+index, the same commit must reconcile that index: either rebuild it at
+`dataset_version >= 2`, or drop fragment `0` from its coverage so `age` queries
+fall to the flat path. After a rebuild at version 4, no overlay remains and the
+`age` index directly returns `26` for Bob with no exclusion needed.
+
+## Guidance
+
+!!! note "This section is a stub."
+
+ The following are implementation considerations, not part of the on-disk
+ specification.
+
+### When to overlay vs. rewrite a column vs. move rows
+
+*(To be expanded.)* The choice between appending an overlay, rewriting a full
+column (data evolution), and moving updated rows to a new fragment depends on the
+fraction of rows changed, the fraction of columns changed, column width, the
+presence of indexes on the changed columns, and the accumulated overlay read
+cost. Roughly: few rows changed favors overlays; most rows in a few columns
+favors a column rewrite; most columns changed favors moving rows to a new
+fragment.
+
+### Writer support
+
+*(To be expanded.)* Dense (rectangular) overlays write with the existing
+equal-length file writer today. Sparse overlays stored as a **single** file
+require the writer to emit columns of independent lengths, which the current v2
+writer does not yet do (it advances all columns from one global row counter).
+Until that support lands, a writer can express a sparse update as multiple dense
+overlays in one transaction.
+
+### Scheduling compaction
+
+*(To be expanded.)* The overlay→overlay and overlay→base modes have very
+different costs; a cost/benefit scheduler decides when each is worthwhile, using
+the version gap as a staleness signal.
+
+### Open questions
+
+*(To be resolved.)*
+
+- **Per-fragment vs. per-table overlays.** Overlays are attached per fragment.
+ Should there be a table-level overlay concept, and how would it interact with
+ fragment-level row addressing?
+- **Relationship to LSM.** Overlays plus compaction resemble an LSM tree (newest
+ layer wins, periodic merge). How far should that analogy be taken, and what do
+ we deliberately do differently given Lance's random-access requirements?
+- **Coverage bitmap spill.** Coverage bitmaps live inline in the manifest. Very
+ large coverage (an overlay touching many rows) may warrant external spill, as
+ the row-ID and last-updated-at sequences already do above a size threshold.
+
+## Related specifications
+
+- [Table format overview](index.md)
+- [Transactions](transaction.md)
+- [Row ID & Lineage](row_id_lineage.md)
+- [Index Formats](../index/index.md)
+- [Format Versioning](versioning.md)
diff --git a/docs/src/format/table/index.md b/docs/src/format/table/index.md
index 94ea4b90dc9..ce4d0b26613 100644
--- a/docs/src/format/table/index.md
+++ b/docs/src/format/table/index.md
@@ -168,6 +168,35 @@ However, this invalidates row addresses and requires rebuilding indices, which c
+## Data Overlay Files
+
+!!! note "Overlay files require feature flag 64 (data overlay files)"
+
+Overlay files supply new values for a subset of `(row offset, field)` cells within
+a fragment without rewriting the base data files. They make updates cheap when only
+a small percentage of rows and/or columns change: a writer appends a small file
+carrying just the changed cells instead of rewriting whole columns or moving rows
+to a new fragment.
+
+On read, each cell is resolved by consulting the fragment's overlays from newest to
+oldest; the first overlay covering that `(offset, field)` wins, otherwise the value
+falls through to the base data file. Indices keep covering the fragment and reconcile
+overlays at query time through a field-aware exclusion set.
+
+For the full specification — coverage and resolution rules, dense vs. sparse layout,
+versioning, index integration, compaction, and a worked example — see the
+[Data Overlay Files Specification](data_overlay_file.md).
+
+
+DataOverlayFile protobuf message
+
+```protobuf
+%%% proto.message.DataOverlayFile %%%
+```
+
+
+
+
## Related Specifications
### Storage Layout
diff --git a/protos/table.proto b/protos/table.proto
index d298809d5d8..cc8b477a6a6 100644
--- a/protos/table.proto
+++ b/protos/table.proto
@@ -113,6 +113,11 @@ message Manifest {
// * 2: row ids are stable and stored as part of the fragment metadata.
// * 4: use v2 format (deprecated)
// * 8: table config is present
+ // * 16: data files use multiple base paths (shallow clone / multi-base)
+ // * 32: the transaction file under _transactions is not written (inline only)
+ // * 64: data overlay files are present (see DataOverlayFile). Readers that do
+ // not understand overlays must refuse the dataset, since ignoring an overlay
+ // would silently return stale base values.
uint64 reader_feature_flags = 9;
// Feature flags for writers.
@@ -311,6 +316,15 @@ message DataFragment {
repeated DataFile files = 2;
+ // Optional overlay files for this fragment, which supply new values for a
+ // subset of cells without rewriting the base data files. This MUST be empty
+ // if the data overlay files feature flag (64) is not set in the manifest.
+ //
+ // Order is significant: a later entry is newer than an earlier one. When two
+ // overlays cover the same (offset, field) and share a `committed_version`, the
+ // later entry wins. See DataOverlayFile for the full resolution rules.
+ repeated DataOverlayFile overlays = 11;
+
// File that indicates which rows, if any, should be considered deleted.
DeletionFile deletion_file = 3;
@@ -433,6 +447,66 @@ message DataFile {
optional uint32 base_id = 7;
} // DataFile
+// An overlay file supplies new values for a subset of (row offset, field) cells
+// within a fragment, without rewriting the fragment's base data files. It is
+// used for efficient updates when only a small fraction of rows and/or columns
+// change.
+//
+// On read, a cell is resolved by consulting the fragment's overlays from newest
+// to oldest: the first overlay that covers that (offset, field) wins; if none
+// cover it, the value falls through to the base data file. Because deletions
+// take precedence over overlays, an overlay value for an offset that is also
+// marked deleted is dead and is ignored.
+//
+// The overlay's data file does NOT store a row-offset key column. Within a value
+// column, the position of a covered offset's value is the rank (0-based count of
+// set bits below it) of that offset within the field's coverage bitmap. Because
+// fields may cover different offset sets, the value columns of a single overlay
+// data file may have different lengths (which the Lance file format permits).
+message DataOverlayFile {
+ // The data file storing the overlay's new cell values, one value column per
+ // field in `data_file.fields`. No row-offset key column is stored.
+ DataFile data_file = 1;
+
+ // Which (offset, field) cells this overlay provides values for.
+ oneof coverage {
+ // A single 32-bit Roaring bitmap of physical row offsets that applies to
+ // every field in `data_file.fields` (a "dense" / rectangular overlay).
+ // Every covered offset has a value for every field. This is the common case
+ // for a plain UPDATE, where one SET list is applied to one set of rows.
+ bytes shared_offset_bitmap = 2;
+ // Per-field coverage for a "sparse" overlay, used when different fields cover
+ // different offset sets (e.g. a MERGE with multiple WHEN MATCHED branches).
+ FieldCoverage field_coverage = 4;
+ }
+
+ // The dataset version at which this overlay became effective: the version of
+ // the commit that introduced it, NOT the version it was read from. It is
+ // stamped at commit time and re-stamped if the commit is retried, in the same
+ // way as the created-at / last-updated-at version sequences.
+ //
+ // This drives two orderings:
+ // * Versus index builds: an index whose `dataset_version` >= this value
+ // already incorporates this overlay. Otherwise the overlay's covered cells
+ // are excluded from index results for the affected fields and re-evaluated
+ // against their current values (see the Data Overlay Files specification).
+ // * Versus other overlays: when two overlays cover the same (offset, field),
+ // the one with the higher `committed_version` wins. Overlays that share a
+ // `committed_version` are ordered by their position in
+ // `DataFragment.overlays`, where a later entry is newer and wins.
+ uint64 committed_version = 3;
+}
+
+// Per-field coverage for a sparse overlay.
+message FieldCoverage {
+ // One entry per field in the overlay's `data_file.fields`, in the same order.
+ // Each is a 32-bit Roaring bitmap of the physical row offsets covered for that
+ // field. An offset present in a field's bitmap but mapped to a NULL value
+ // means the cell is overridden to NULL (distinct from an offset that is absent,
+ // which falls through to the base data file).
+ repeated bytes offset_bitmaps = 1;
+}
+
// Deletion File
//
// The path of the deletion file is constructed as: