Skip to content

[SPARK-57167][SQL] Simplify MakeTimestamp codegen under ANSI mode#56217

Open
gengliangwang wants to merge 2 commits into
apache:masterfrom
gengliangwang:spark-make-timestamp-codegen
Open

[SPARK-57167][SQL] Simplify MakeTimestamp codegen under ANSI mode#56217
gengliangwang wants to merge 2 commits into
apache:masterfrom
gengliangwang:spark-make-timestamp-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Extend DateTimeExpressionUtils.java with two static helpers and route MakeTimestamp's eval and codegen paths through them:

  • makeTimestampMicros(int year, int month, int day, int hour, int min, Decimal secAndMicros, ZoneId zoneId, boolean timestampNTZ): the shared, exception-raising core. It computes the micros (including the leap-second sec = 60 rollover supported for PostgreSQL compatibility), throwing SparkDateTimeException for an invalid fraction of second and DateTimeException for an invalid year/month/day/hour/min combination.
  • makeTimestampExact(...): the ANSI (failOnError = true) wrapper. It rethrows SparkDateTimeException as-is (to preserve its message) and translates any other DateTimeException to ansiDateTimeArgumentOutOfRange. SparkDateTimeException is caught first because it is itself a DateTimeException.

MakeTimestamp.toMicros (eval) and doGenCode now dispatch on failOnError: the ANSI path emits a single makeTimestampExact(...) call, while the non-ANSI path calls makeTimestampMicros(...) inside the existing inline try/catch -> isNull form (matching the pattern used by the already-merged MakeDate / MakeInterval cleanups).

Why are the changes needed?

Part of SPARK-56908 (umbrella). The ANSI branch of MakeTimestamp.doGenCode previously emitted a ~22-line inline block (Decimal floor/nanos math, LocalDateTime.of, leap-second handling, the timezone/NTZ conversion, and a two-arm try/catch mapping to the ANSI errors) into every generated stage that calls make_timestamp. Collapsing it to a single helper call shrinks the generated Java source, helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work. As a side benefit, eval and codegen now share one implementation, removing a latent divergence (codegen previously derived seconds/nanos via Decimal.floor while eval used toUnscaledLong).

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *DateExpressionsSuite"

75/75 pass (covers make_timestamp / make_timestamp_ntz / make_timestamp_ltz, exercised both with and without whole-stage codegen).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

### What changes were proposed in this pull request?

Extend `DateTimeExpressionUtils.java` with two static helpers and route
`MakeTimestamp`'s eval and codegen paths through them:

* `makeTimestampMicros(year, month, day, hour, min, Decimal secAndMicros,
  ZoneId zoneId, boolean timestampNTZ)`: the shared, exception-raising core.
  It computes the micros (leap-second `sec = 60` rollover for PostgreSQL
  compatibility included), throwing `SparkDateTimeException` for an invalid
  fraction of second and `DateTimeException` for an invalid
  year/month/day/hour/min.
* `makeTimestampExact(...)`: the ANSI (`failOnError = true`) wrapper. It
  rethrows `SparkDateTimeException` as-is (preserving its message) and
  translates any other `DateTimeException` to `ansiDateTimeArgumentOutOfRange`.

`MakeTimestamp.toMicros` (eval) and `doGenCode` now dispatch on `failOnError`:
the ANSI path emits a single `makeTimestampExact(...)` call; the non-ANSI path
calls `makeTimestampMicros(...)` inside the existing inline `try/catch -> isNull`.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). The ANSI branch of `MakeTimestamp.doGenCode`
previously emitted a ~22-line inline block (Decimal floor/nanos math,
`LocalDateTime.of`, leap-second handling, the timezone/NTZ conversion, and a
two-arm `try/catch` mapping to the ANSI errors) into every generated stage that
calls `make_timestamp`. Collapsing it to one helper call shrinks the generated
Java source. As a side benefit, eval and codegen now share one implementation,
removing a latent divergence (codegen previously derived seconds/nanos via
`Decimal.floor` while eval used `toUnscaledLong`).

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *DateExpressionsSuite"
```

75/75 pass (covers make_timestamp / make_timestamp_ntz / make_timestamp_ltz,
exercised both with and without whole-stage codegen).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant