Skip to content

fix: compact blob v2 datasets via rewrite fallback#38

Open
everySympathy wants to merge 1 commit into
mainfrom
blobv2/compaction
Open

fix: compact blob v2 datasets via rewrite fallback#38
everySympathy wants to merge 1 commit into
mainfrom
blobv2/compaction

Conversation

@everySympathy

@everySympathy everySympathy commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a BlobV2-safe compaction fallback for daft-lance. When a dataset contains BlobV2 columns, compaction rewrites the visible rows instead of using the existing path that cannot safely handle BlobV2 columns.

Changes

  • Detects BlobV2 columns before running compaction.
  • Uses take_blobs to materialize BlobV2 values for visible rows.
  • Rewrites the dataset through a fallback path that preserves logical row contents.
  • Adds tests covering BlobV2 compaction behavior.

Testing

  • uv run --python /usr/bin/python3.11 pytest tests/io/lancedb/test_lancedb_compaction.py -q
    • Result: 6 passed
  • Lint/type checks for touched files passed locally.

Notes / Risks

  • This is intentionally a rewrite fallback, not native in-place BlobV2 compaction.
  • External/reference blobs are materialized into managed bytes by this fallback.
  • Native BlobV2-aware compaction can be added later when Lance exposes a safer direct path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant