Skip to content

feat: add blob v2 columns through merge_columns_df#39

Draft
everySympathy wants to merge 1 commit into
mainfrom
blobv2/add-columns
Draft

feat: add blob v2 columns through merge_columns_df#39
everySympathy wants to merge 1 commit into
mainfrom
blobv2/add-columns

Conversation

@everySympathy

Copy link
Copy Markdown
Collaborator

Summary

Adds support for adding BlobV2 columns through merge_columns_df by allowing callers to mark selected new columns as BlobV2 payload columns.

Changes

  • Adds a blob_columns parameter to daft_lance.merge_columns_df / merge_columns_from_df.
  • Converts marked columns with lance.blob_array(...) before merging.
  • Routes BlobV2 add-column requests through the existing keyed merge slow path.
  • Adds test coverage that creates a Lance 2.2 dataset, adds a BlobV2 column, and verifies the stored payload through take_blobs.

Testing

  • uv run --python /usr/bin/python3.11 pytest tests/io/lancedb/test_fast_path_merge.py::TestBasicCorrectness::test_fast_path_adds_blob_v2_column -v
    • Result: passed, EXIT:0

Additional investigation:

  • Full test_fast_path_merge.py on this branch:
    • Result: 38 passed, then process exits 139
  • Excluding the new BlobV2 add-column test on this branch:
    • Result: 37 passed, 1 deselected, then process exits 139
  • Same full file on main/baseline:
    • Result: 37 passed, then process exits 139
  • A standalone fast-path merge script also reproduces EXIT:139.
  • The same standalone scenario forced through slow path exits 0.

Notes / Risks

The 139 is a pre-existing native teardown / heap corruption issue in the current fast-path merge tests, not introduced by this BlobV2 add-column change.

The BlobV2 add-column implementation intentionally bypasses fast path because the current raw LanceFileWriter + manually stitched FragmentMetadata fast path is the path associated with the native exit crash. This PR should either be submitted as draft with the known fast-path stability issue called out, or submitted after a separate tracking issue is filed for the existing fast-path crash.

Suggested tracking issue:

test_fast_path_merge.py fast path causes native heap corruption on interpreter exit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant