[HSTACK] Fix MAP schema evolution: allow additive value-struct changes#5
Open
[HSTACK] Fix MAP schema evolution: allow additive value-struct changes#5
Conversation
DataFusion correctly handles Struct→Struct schema evolution (missing fields filled with nulls, extra fields ignored) but falls back to Arrow's generic cast for Map→Map. Since Arrow's generic cast cannot add missing struct fields, reading Delta tables where the MAP value struct has gained optional fields (additive evolution) fails with: Cannot cast column 'identityMap' from '...(physical data type)' to '...(logical data type)' Fix: 1. schema_rewriter.rs — validate_compat match: add Map arm that recursively calls validate_struct_compatibility on the internal key_value struct, so additive value-struct changes pass the compatibility check. 2. nested_struct.rs — cast_column match: add Map arm via cast_map_column() which casts the key_value StructArray through cast_column (preserving struct evolution semantics) and rebuilds the MapArray with the same offsets and validity bitmap. Tests added: - test_cast_map_with_evolved_value_struct: end-to-end cast of a MAP where the physical value struct (only "id") is evolved to the logical schema (id + primary + authenticatedState). - test_validate_map_value_struct_compatibility: confirms the compatibility check passes for additive value-struct changes. Real-world trigger: AEP identityMap is Map<String, List<Struct<…>>>. Older Parquet files have fewer fields in the Struct; newer files have more. DataFusion previously failed to read the older files.
added 3 commits
April 6, 2026 12:27
Follow repo conventions: move test imports to mod-level, use field() helpers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a Delta table's logical schema has evolved additively — e.g.
identityMap: Map<String, List<Struct<id>>>in older Parquet files vsMap<String, List<Struct<id, primary, authenticatedState>>>in the Delta log — DataFusion failed with:The schema rewriter rejected the cast and the physical plan builder had no handler for MAP-to-MAP evolution.
Fix
datafusion/common/src/nested_struct.rsDataType::Maparm incast_columnthat delegates to a newcast_map_column()function.cast_map_columncasts the innerkey_valueStructArray through the existingcast_columnlogic, so missing nullable fields in the value struct are filled with nulls and extra source fields are ignored — the same additive evolution semantics already supported for plain structs.datafusion/physical-expr-adapter/src/schema_rewriter.rs(DataType::Map, DataType::Map)arm to the compatibility check intry_adapt_physical_exprthat validates the inner key_value struct fields viavalidate_struct_compatibility, allowing additive changes to pass instead of being rejected bycan_cast_types.Tests
test_cast_map_with_evolved_value_struct— end-to-end cast of a real-worldidentityMap: Map<String, List<Struct>>where older files have fewer value-struct fields.test_validate_map_value_struct_compatibility— unit test for the schema_rewriter compatibility check on Map types.Notes
The change is purely additive: the two new
matcharms intercept MAP types before the existing_ =>fallback; all other types are unaffected.