Skip to content

[SPARK-57158][SQL] ExplainUtils: extract operator ID assignment phase from processPlan into a private helper#56216

Open
markj-db wants to merge 2 commits into
apache:masterfrom
markj-db:mark-jarvin_data/explain-utils-generate-plan-ids
Open

[SPARK-57158][SQL] ExplainUtils: extract operator ID assignment phase from processPlan into a private helper#56216
markj-db wants to merge 2 commits into
apache:masterfrom
markj-db:mark-jarvin_data/explain-utils-generate-plan-ids

Conversation

@markj-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Extract the operator-ID-assignment phase of ExplainUtils.processPlan into a private assignOperatorIds(plan, idMap) helper. processPlan now initializes the idMap, delegates all ID assignment to assignOperatorIds, then performs the text-output pass over the returned subqueries and optimized-out exchanges.

Why are the changes needed?

processPlan conflated two independent sequential phases in one ~40-line body:

  1. ID assignment — traversing the plan tree, subqueries, and adaptively-optimized-out exchanges (SPARK-42753) to populate an IdentityHashMap with monotonically-increasing operator IDs.
  2. Text output — calling processPlanSkippingSubqueries on each discovered subtree to format the verbose explain string.

These phases are sequential and independent: the text-output pass only begins after ID assignment is fully complete. Extracting phase 1 into assignOperatorIds shortens processPlan to its output logic and makes the boundary between the two phases explicit. No behavior change.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added ExplainUtilsSuite covering:

  • Operator IDs assigned to all visible plan nodes are unique
  • processPlan sets CODEGEN_ID_TAG on nodes inside WholeStageCodegenExec
  • Thread-local localIdMap is restored to its prior value after processPlan returns
  • Subquery section is emitted in the explain output when subqueries are present

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

markj-db added 2 commits May 30, 2026 02:25
… from processPlan into a private helper

### What changes were proposed in this pull request?

Extract the operator-ID-assignment phase of `ExplainUtils.processPlan` into a
private `assignOperatorIds(plan, idMap)` helper. `processPlan` now initializes
the idMap, delegates all ID assignment to `assignOperatorIds`, then performs the
text-output pass over the returned subqueries and optimized-out exchanges.

### Why are the changes needed?

`processPlan` conflated two independent sequential phases in one 40-line body:
(1) populating an IdentityHashMap with operator IDs by traversing the plan tree,
subqueries, and adaptively-optimized-out exchanges (SPARK-42753); and (2)
calling `processPlanSkippingSubqueries` to format the verbose explain string.
Extracting phase 1 shortens `processPlan` to its output logic and makes the
boundary between the two phases explicit. No behavior change.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added `ExplainUtilsSuite` covering operator ID uniqueness, WholeStageCodegen tag
propagation, thread-local restoration, and subquery section output.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant