Skip to content

perf: speed up cold read up to 8x by lazily load column metadata#7375

Open
Xuanwo wants to merge 5 commits into
mainfrom
xuanwo/lazy-column-metadata
Open

perf: speed up cold read up to 8x by lazily load column metadata#7375
Xuanwo wants to merge 5 commits into
mainfrom
xuanwo/lazy-column-metadata

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR adds a lightweight file metadata index and a data-reader path that can load column metadata on demand for supported narrow projections. Dataset fragment reads now use this path for sparse top-level V2.1+ physical column projections and continue to fall back to full metadata when the projection is unsupported or the caller needs file-wide metadata.

The problem is that the current full-metadata path decodes every column's metadata before reading data. On ultra-wide files, a one-column scan or point lookup pays metadata cost proportional to the total column count.

Performance

Benchmarked on EC2 instance xuanwo-lance-lazy-metadata-bench with a 65,000-column Lance file and random single-column point lookups. The full-metadata reader is the origin/main behavior. Results below are mean latency.

Storage origin/main full metadata cold this PR lazy metadata cold Speedup Metadata read bytes IOPS
Local EBS 159.1 ms 19.3 ms 8.24x 10.388 MiB -> 0.998 MiB 4 -> 5
S3 us-east-2 368.4 ms 144.4 ms 2.55x 10.388 MiB -> 1.057 MiB 4 -> 5

Hot reopen reads are effectively unchanged because metadata/data page reads are already cached: Local EBS was 10.4 ms -> 9.6 ms, and S3 was 36.3 ms -> 35.7 ms.

The S3 cold path now issues only one additional request versus full metadata, not two, because dataset-known schema and row count are passed into the metadata-index reader instead of re-reading the schema global buffer.

Validation

Validated with targeted lazy metadata tests, the dropped-column statistics regression, cargo fmt --all, and full Rust clippy with tests and benches enabled.

@github-actions github-actions Bot added A-encoding Encoding, IO, file reader/writer performance labels Jun 19, 2026
@Xuanwo Xuanwo marked this pull request as ready for review June 19, 2026 15:00
@Xuanwo Xuanwo changed the title perf: lazily load column metadata perf: speed up cold read up to 8x by lazily load column metadata Jun 19, 2026
@westonpace

Copy link
Copy Markdown
Member

😆 I was just talking to Pedro and Xiaoxuan from Meta and they were surprised we haven't had to do this yet.

I guess the time has finally come 🎉

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary complaint is a naming complaint. If we want to proceed with FileDataReader then I can live with that. However, can we name it something like ProjectedFileReader so that it implies it is a reader that is scoped to a particular projection?

Attaching Claude's review below


Here is my review of PR #7375:


PR #7375perf: speed up cold read up to 8x by lazily load column metadata

Author: Xuanwo | Files changed: 4 | +1771 / -484

Overview

This PR introduces a two-tier metadata loading strategy for Lance V2.1+ files. Instead of eagerly decoding all column metadata upfront (the current behavior), it adds a FileMetadataIndex that reads only the CMO (column metadata offset) table, then fetches and decodes only the column metadata entries needed for the current projection. A new FileDataReader type wraps this path. The heuristic to engage lazy loading is selected_columns * 4 < total_columns (i.e., less than 25% of columns selected). The reported speedups (8x local, 2.5x S3) are compelling.


Correctness Concerns

P0 — storage_stats panics instead of returning an error:

fn storage_stats(&self) -> Vec<(u32, u64)> {
    let Some(file_statistics) = self.reader.file_statistics() else {
        panic!("storage_stats requires full file metadata");
    };

This panics at runtime with an unhelpful error if called through a code path that doesn't guarantee full metadata. The method doesn't return Result, so adding ? isn't trivial, but a debug_assert! plus a proper Result return would be much safer. Currently it only gets called from calculate_data_stats which calls open_readers_with_full_metadata first, but the invariant is fragile.

P1 — Cache key uses file_size_bytes as file identity proxy:

fn key(&self) -> Cow<'_, str> {
    Cow::Owned(format!("column_metadata/{}/{}", self.file_size_bytes, self.column_index))
}

LanceCache::with_key_prefix(path) already scopes the cache to the file path, so file_size_bytes in the key is redundant. It won't cause bugs now, but indicates a design confusion. Two files at the same path (after an overwrite) with the same size could theoretically collide. File modification time or a content hash would be safer, or just remove the size from the key and rely on path scoping alone.

P1 — Potential regression in Full path: prepare_projection returns all column_infos unfiltered:

Self::Full(metadata) => Ok(PreparedProjection {
    column_infos: metadata.column_infos.clone(),  // all columns, always
    ...
}),

The previous collect_columns_from_projection filtered to only the projected columns before passing to the decoder. It's not clear from the diff whether the downstream do_read_range/do_take_rows/do_read_ranges still filter based on decoder_projection, or whether they now receive and process all column_infos redundantly. If the latter, wide-table full-metadata reads will be slower due to the extra clone. This should be confirmed.


Design / Code Quality

  • FileDataReader naming is unclear — both FileReader and FileDataReader read data. LazyFileReader or IndexedMetadataFileReader would signal intent at the call site.
  • require_indexed_base_projection error message ("indexed column metadata reader requires an explicit base projection") surfaces implementation-internal language to users.
  • Duplicated read_range/take_rows/read_ranges dispatch in FileReadCore is repetitive; a small macro or helper could compress it.
  • The BoxFuture boxing in open_reader_impl and execute_uncommitted_impl is a correct fix for the recursive-async-fn limitation.

Test Coverage

The new tests are solid, especially test_lazy_reader_loads_only_requested_column_metadata (byte-level I/O measurement) and the fragment-level IOPS test. Gaps:

  • No test for cache hit path (second read on the same column should issue zero column-metadata I/O).
  • No test for the new validation in decode_cmo_table (out-of-range offsets on malformed files).
  • storage_stats panic path is not tested and is currently unreachable only by convention.

Minor Positives

  • The bounds check added to decode_column_metadatas (validates column_metadata_bytes.len() >= cmo_table_size) is a good defensive correctness improvement.
  • The overflow on position.checked_add(length)guard indecode_cmo_table` is correct.
  • Statistics regression test (test_calculate_data_stats_after_dropping_wide_dataset_columns) catches the drop-column edge case cleanly.
  • Double-gate heuristic (check before and after FileMetadataIndex load) is harmless.

Summary

The core algorithm is sound and the perf numbers are strong. Before merging, I'd address: (1) the storage_stats panic, (2) confirm the Full path column_infos clone doesn't regress wide-table reads, and (3) clarify or simplify the cache key. A cache hit test would be valuable.

/// and column metadata offset table. Unlike [`CachedFileMetadata`], it does not
/// hold decoded metadata for every column.
#[derive(Debug)]
pub struct FileMetadataIndex {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we derive DeepSize instead of a custom impl? I don't see anything in here missing a DeepSize impl.

Comment on lines +242 to +246
let retained_buffers_size: usize = self
.retained_global_buffers
.values()
.map(|buf| buf.len())
.sum();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly inaccurate (probably not enough to matter) because it doesn't include the BTreeMap's buffer.

+ std::mem::size_of_val(column_metadatas)
}

fn column_info_deep_size(column_info: &ColumnInfo) -> usize {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not impl (or derive) DeepSizeOf on ColumnInfo?

Comment on lines +542 to +546
/// A data reader for Lance files.
///
/// This reader can use either full file metadata or indexed per-column metadata
/// internally. It intentionally does not expose APIs that require synchronous
/// access to full file metadata.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't really explain why a caller would prefer this over FileReader (and they have very similar names and APIs)

We should explain that this struct establishes a base projection at construction time that cannot change. This means all reads are limited by the base projection but it also means that we only need to read the column metadata for the fields in the base projection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants