Skip to content

Add configurable V1 blob encoding for large payload columns#49

Merged
beinan merged 4 commits into
lance-format:mainfrom
beinan:feat/blob-encoding
Jun 7, 2026
Merged

Add configurable V1 blob encoding for large payload columns#49
beinan merged 4 commits into
lance-format:mainfrom
beinan:feat/blob-encoding

Conversation

@beinan

@beinan beinan commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add blob_columns: HashSet<String> to ContextStoreOptions to let users opt columns into Lance V1 blob encoding
  • Valid columns: text_payload and binary_payload — blob-encoded columns store data in out-of-line buffers for efficient storage of large/unpredictable content
  • batch_to_records() auto-detects column types from the batch schema, preserving backward compatibility with existing non-blob datasets
  • Python bindings expose blob_columns parameter on Context.create()

Usage

// Rust
let options = ContextStoreOptions {
    blob_columns: HashSet::from(["binary_payload".into(), "text_payload".into()]),
    ..Default::default()
};
let store = ContextStore::open_with_options(uri, options).await?;
# Python
ctx = Context.create("./store", blob_columns=["binary_payload", "text_payload"])

Test plan

  • test_blob_binary_payload — roundtrip with blob-encoded binary_payload
  • test_blob_text_payload — roundtrip with blob-encoded text_payload (LargeUtf8 → LargeBinary)
  • test_blob_both_columns — both columns blob-encoded simultaneously
  • test_no_blob_default — default (no blob) schema unchanged
  • test_blob_schema_metadata — verifies lance-encoding:blob metadata on fields
  • test_blob_invalid_column_name — rejects unknown column names
  • test_batch_to_records_autodetects_text_type — auto-detection works for both LargeUtf8 and LargeBinary batches
  • All 15 tests pass (cargo test -p lance-context-core)
  • Python bindings compile (cargo check -p lance-context-python)

🤖 Generated with Claude Code

Beinan Wang added 4 commits June 6, 2026 21:37
Enable Lance V1 blob encoding via `blob_columns` option in ContextStoreOptions.
Supports `text_payload` and `binary_payload` columns. Blob-encoded columns store
data in out-of-line buffers for efficient storage of large/unpredictable content.

Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
Co-Authored-By: Beinan Wang <beinanwang@microsoft.com>
@beinan beinan marked this pull request as ready for review June 7, 2026 07:58
@beinan beinan merged commit 998ca18 into lance-format:main Jun 7, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant