Skip to content

[SPARK-57168][SQL] Simplify GetTimestamp codegen under ANSI mode#56218

Open
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-get-timestamp-codegen
Open

[SPARK-57168][SQL] Simplify GetTimestamp codegen under ANSI mode#56218
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-get-timestamp-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Add DateTimeExpressionUtils.parseToTimestampExact(TimestampFormatter formatter, String input, long downScaleFactor, boolean forTimestampNTZ, String suggestedFuncOnFail) and route the ANSI (failOnError = true) eval and codegen paths of ToTimestamp (the base of GetTimestamp / to_timestamp, unix_timestamp, etc.) through it. The helper parses via parseWithoutTimeZone (TIMESTAMP_NTZ, no down-scaling) or parse + / downScaleFactor, and translates a DateTimeException (which also covers DateTimeParseException) or a ParseException to ansiDateTimeParseError; any other exception (e.g. IllegalStateException) propagates unchanged, matching the previous behavior.

ToTimestamp.doGenCode previously emitted the same 5-line try { parse } catch (DateTimeException) catch (ParseException) block at both string call sites (the cached-formatter path and the per-row-formatter path). A local parseTimestampCode now dispatches on failOnError: the ANSI branch emits a single parseToTimestampExact(...) call, while the non-ANSI branch keeps the inline try/catch -> isNull form (the same shape used by the already-merged MakeDate / MakeInterval cleanups). The eval path delegates to the same helper for consistency.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the duplicated inline try/catch to a single helper call shrinks the generated Java for the common to_timestamp / unix_timestamp family, helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *DateExpressionsSuite"

75/75 pass (exercised both with and without whole-stage codegen).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

### What changes were proposed in this pull request?

Add `DateTimeExpressionUtils.parseToTimestampExact(TimestampFormatter formatter,
String input, long downScaleFactor, boolean forTimestampNTZ, String
suggestedFuncOnFail)` and route the ANSI (`failOnError = true`) eval and codegen
paths of `ToTimestamp` (the base of `GetTimestamp` / `to_timestamp` etc.) through
it. The helper parses via `parseWithoutTimeZone` (TIMESTAMP_NTZ, no down-scaling)
or `parse` + `/ downScaleFactor`, and translates a `DateTimeException` (which
covers `DateTimeParseException`) or `ParseException` to `ansiDateTimeParseError`;
any other exception (e.g. `IllegalStateException`) propagates unchanged, matching
the previous behavior.

`ToTimestamp.doGenCode` previously emitted the same 5-line
`try { parse } catch (DateTimeException) catch (ParseException)` block at both
string call sites (the cached-formatter path and the per-row-formatter path). A
local `parseTimestampCode` now dispatches on `failOnError`: the ANSI branch emits
a single `parseToTimestampExact(...)` call; the non-ANSI branch keeps the inline
`try/catch -> isNull`. The eval path delegates to the same helper for consistency.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the duplicated inline try/catch to a
single helper call shrinks the generated Java for the common `to_timestamp` /
`unix_timestamp` family.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *DateExpressionsSuite"
```

75/75 pass (exercised both with and without whole-stage codegen).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant