Skip to content

feat: add RLE v2 run length widths#7376

Draft
Xuanwo wants to merge 2 commits into
mainfrom
xuanwo/rle-v2-run-length-widths
Draft

feat: add RLE v2 run length widths#7376
Xuanwo wants to merge 2 commits into
mainfrom
xuanwo/rle-v2-run-length-widths

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds RLE v2 run-length widths so newly created datasets can write RLE pages with u16 or u32 run lengths instead of splitting every run at 255 values. The capability is recorded as a reader feature flag and is only enabled when a new dataset is created with WriteParams::enable_rle_v2; existing unflagged datasets reject attempts to turn it on mid-stream.

Closes #7327.

Benchmark

Ran on xuanwo-lance-lazy-metadata-bench with a #6941-style sorted low-cardinality asset_id workload.

workload Lance default Lance RLE2 reduction
150M rows / 5k assets / random5 value 167.36 MiB 164.57 MiB 1.67%
150M rows / 5k assets / by-asset5 value 7.62 MiB 2.03 MiB 73.34%

The first row keeps the random low-cardinality value column from the issue-like workload, which dominates total size. The second row isolates the long-run case RLE2 targets.

Validation

Validated with focused RLE2 tests and full Rust clippy before publishing.

@github-actions github-actions Bot added A-encoding Encoding, IO, file reader/writer A-format On-disk format: protos and format spec docs A-namespace Namespace impls labels Jun 19, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added the enhancement New feature or request label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer A-format On-disk format: protos and format spec docs A-namespace Namespace impls enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add opt-in RLE v2 run-length widths

1 participant