Skip to content

fix(reader): gracefully handle missing Parquet column index in row se…#2464

Open
jdwil wants to merge 1 commit into
apache:mainfrom
jdwil:fix/2452-graceful-missing-column-index
Open

fix(reader): gracefully handle missing Parquet column index in row se…#2464
jdwil wants to merge 1 commit into
apache:mainfrom
jdwil:fix/2452-graceful-missing-column-index

Conversation

@jdwil
Copy link
Copy Markdown

@jdwil jdwil commented May 18, 2026

Gracefully handle Parquet files missing column/offset indexes by skipping page-level row selection and falling back to existing row-group filtering plus Arrow row filtering. This preserves predicate correctness for older or migrated Parquet files that lack page index metadata.

Added integration coverage for Parquet files without column/offset indexes. The test verifies that scans no longer fail when page indexes are absent, page-level row selection is skipped gracefully, and predicate filtering (id < 3) still produces the correct filtered result set ([1, 2]) via the existing row-group + Arrow filtering path.

Closes #2452

…lection

When row_selection_enabled is true and the Parquet file lacks column or
offset index metadata (common with older/migrated files), the reader now
skips page-level row pruning instead of returning an error.

Row-group filtering via statistics and the ArrowPredicate row filter
still function normally; only page-index-based RowSelection is skipped.

Closes apache#2452
@jdwil jdwil force-pushed the fix/2452-graceful-missing-column-index branch from 3cf00d7 to f1a268a Compare May 18, 2026 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg scan error: Parquet file metadata does not contain a column index

2 participants