Prune unused passthrough columns from UNNEST output (opt-in)#18782
Open
gortiz wants to merge 1 commit into
Open
Prune unused passthrough columns from UNNEST output (opt-in)#18782gortiz wants to merge 1 commit into
gortiz wants to merge 1 commit into
Conversation
Add an opt-in (default-off) query option `unnestColumnPruning` for the multi-stage engine. When enabled, a Project sitting directly above a CROSS JOIN UNNEST has its unreferenced input/passthrough columns - notably the unnested source array - dropped from the UnnestNode output, so the operator no longer copies them into every exploded row. - UnnestNode carries a passthrough index map + prunedPassthrough flag (legacy constructors default to "copy whole row"). - RelToPlanNodeConverter fuses the pruning into convertLogicalProject: computes the referenced left columns, builds a pruned UnnestNode with recomputed element/ordinality indexes, and remaps the project refs. Falls back to current behavior in every other shape. - UnnestOperator honors the passthrough map (primitive int[] hot path). - Additive protobuf fields keep old-broker->new-server safe; the flag defaults off so a new broker never emits the smaller schema to an un-upgraded server (enable only after the whole fleet is upgraded). Covered by planner, serde round-trip, operator, and integration tests (single/multi array, WITH ORDINALITY, zero-passthrough, array-also-selected).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18782 +/- ##
=============================================
- Coverage 64.78% 37.26% -27.52%
+ Complexity 1309 1308 -1
=============================================
Files 3380 3380
Lines 209544 209683 +139
Branches 32797 32836 +39
=============================================
- Hits 135746 78132 -57614
- Misses 62870 124429 +61559
+ Partials 10928 7122 -3806
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in, default-off query option
unnestColumnPruningfor the multi-stage engine that prunes input/passthrough columns — notably the unnested source array — from theUNNESToutput when nothing downstream references them.Today, for a query like:
the
UnnestNodeoutput schema is the full Calcite Correlate row type[col1, mcol1, s], andUnnestOperatorcopies the entire input row (including the source arraymcol1) into every one of the N exploded rows — only for a parentProjectto immediately dropmcol1. For large arrays this needlessly widens every intermediate row (and serializes the array N times when an exchange sits between the UNNEST and the projecting operator).The array expression is only needed in the operator's input (to evaluate the explode), never in its output, unless the user also selects it. With the flag on, the source array is no longer carried.
How it works
UnnestNodecarries a passthrough index map +prunedPassthroughflag. Legacy constructors default to not pruned (copy the whole input row), so existing behavior is unchanged.RelToPlanNodeConverterfuses the pruning intoconvertLogicalProject: when aProjectsits directly above a Correlate/Uncollect (no wrapping correlate-filter), it computes which left columns the project actually references, builds a prunedUnnestNode(smaller output schema + passthrough map + recomputed element/ordinality indexes), and remaps that one project'sInputRefs. It falls back to the current behavior in every other shape, and converts the correlate at most once.UnnestOperatorhonors the passthrough map, copying only retained columns (resolved to a primitiveint[]so the per-row hot path stays allocation/box-free). When not pruned, it keeps the legacy whole-rowSystem.arraycopy.Backward compatibility / rolling upgrades
UnnestNodeis serialized broker→server and the operator runs server-side, so this is a two-sided change. The proto fields (passthroughInputIndexes,prunedPassthrough) are additive:prunedPassthrough=false⇒ legacy "copy whole row". Safe.Tests
UnnestSqlPlannerTest— flag on/off, source-array-also-selected (no-op), WITH ORDINALITY, multiple arrays, zero-passthrough.PlanNodeSerDeTest— protobuf round-trip for pruned (non-sequential indexes + ordinality) and legacyUnnestNode.UnnestOperatorTest— pruned single-array, zero-passthrough, WITH ORDINALITY, multiple arrays; legacy path unchanged.UnnestIntegrationTest— end-to-end pruned-vs-default result equality across single/multi array, WITH ORDINALITY, zero-passthrough, and array-also-selected shapes.Notes for reviewers
RelToPlanNodeConverter+ the node/operator + serde; the V2 physical path is untouched.