Skip to content

feat: allow tuning miniblock value chunks to 32k#7356

Merged
Xuanwo merged 2 commits into
mainfrom
xuanwo/allow-32k-miniblock-tuning
Jun 18, 2026
Merged

feat: allow tuning miniblock value chunks to 32k#7356
Xuanwo merged 2 commits into
mainfrom
xuanwo/allow-32k-miniblock-tuning

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

This allows miniblock writers to use up to 32K logical values per chunk when explicitly configured via LANCE_MINIBLOCK_MAX_VALUES, while keeping the default at 4096.

The file format already stores log_num_values in 4 bits, so the writer-side guard can allow values up to 15 without requiring the large-chunk metadata path. The compressed byte-size limits remain enforced.

Fixes #7326.

@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added enhancement New feature or request A-encoding Encoding, IO, file reader/writer A-format On-disk format: protos and format spec docs labels Jun 18, 2026
@Xuanwo Xuanwo marked this pull request as ready for review June 18, 2026 09:37
@Xuanwo Xuanwo requested a review from westonpace June 18, 2026 09:37
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 71.69811% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
.../lance-encoding/src/encodings/logical/primitive.rs 67.39% 14 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bumping the max up to 32Ki is fine. I think there is some small chance we might end up seeing the "too much rep/def information and it doesn't all fit in the miniblock" error again but we can tackle that problem a different way when it surfaces.

Comment thread rust/lance-encoding/src/encodings/logical/primitive.rs Outdated
@Xuanwo Xuanwo merged commit 2878189 into main Jun 18, 2026
29 of 30 checks passed
@Xuanwo Xuanwo deleted the xuanwo/allow-32k-miniblock-tuning branch June 18, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer A-format On-disk format: protos and format spec docs enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support 32K miniblock chunks

2 participants