Skip to content

[SPARK-57172][SQL] Simplify Crc32 codegen by extracting a static Java helper#56222

Open
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-crc32-codegen
Open

[SPARK-57172][SQL] Simplify Crc32 codegen by extracting a static Java helper#56222
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-crc32-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Add ExpressionImplUtils.crc32(byte[] bytes) and route Crc32's eval and codegen paths through it. Crc32.doGenCode previously emitted a 3-line allocate / update / getValue sequence inline; it now emits a single ExpressionImplUtils.crc32(...) call, and the eval path calls the same helper.

This is a plain (non-ANSI, non-try/catch) type-independent block, in line with the broadened goal of SPARK-56908 to move fixed generated-Java logic into static Java helpers.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the inline CRC32 sequence to one call shrinks the generated Java for every stage that computes crc32, helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *HashExpressionsSuite"

40/40 pass, including crc32 (exercised both with and without whole-stage codegen).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

… helper

### What changes were proposed in this pull request?

Add `ExpressionImplUtils.crc32(byte[] bytes)` and route `Crc32`'s eval and
codegen paths through it. `Crc32.doGenCode` previously emitted a 3-line
allocate / `update` / `getValue` sequence inline; it now emits a single
`ExpressionImplUtils.crc32(...)` call, and the eval path calls the same helper.

This is a plain (non-ANSI, non-try/catch) type-independent block, in line with
the broadened goal of SPARK-56908 to move fixed generated-Java logic into static
Java helpers.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the inline CRC32 sequence to one call
shrinks the generated Java for every stage that computes `crc32`.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *HashExpressionsSuite"
```

40/40 pass, including `crc32` (exercised both with and without whole-stage codegen).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant