Skip to content

[SPARK-57174][SQL] Simplify Chr codegen by extracting a static Java helper#56224

Open
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-chr-codegen
Open

[SPARK-57174][SQL] Simplify Chr codegen by extracting a static Java helper#56224
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-chr-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Add ExpressionImplUtils.chr(long longVal) and route Chr's eval and codegen paths through it. Chr.doGenCode previously emitted a ~7-line inline if/else chain (negative -> empty string, (v & 0xFF) == 0 -> NUL, otherwise the Latin-1 character); it now emits a single ExpressionImplUtils.chr(...) call, and the eval path calls the same helper.

This is a plain (non-ANSI, non-try/catch) type-independent block, in line with the broadened goal of SPARK-56908 to move fixed generated-Java logic into static Java helpers.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the inline if/else chain to one call shrinks the generated Java for every stage that uses chr, helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *StringExpressionsSuite"

59/59 pass, including string for ascii which covers Chr over negative, zero, wrap-around (256), high-bit (149) and null inputs (exercised both with and without whole-stage codegen).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…elper

### What changes were proposed in this pull request?

Add `ExpressionImplUtils.chr(long longVal)` and route `Chr`'s eval and codegen
paths through it. `Chr.doGenCode` previously emitted a ~7-line inline if/else
chain (negative -> empty string, `(v & 0xFF) == 0` -> NUL, otherwise the
Latin-1 character); it now emits a single `ExpressionImplUtils.chr(...)` call,
and the eval path calls the same helper.

This is a plain (non-ANSI, non-try/catch) type-independent block, in line with
the broadened goal of SPARK-56908 to move fixed generated-Java logic into static
Java helpers.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the inline if/else chain to one call
shrinks the generated Java for every stage that uses `chr`.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *StringExpressionsSuite"
```

59/59 pass, including `string for ascii` which covers `Chr` over negative, zero,
wrap-around (256), high-bit (149) and null inputs (exercised both with and
without whole-stage codegen).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
@LuciferYang
Copy link
Copy Markdown
Contributor

This is a very valuable topic. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants