Skip to content

Pull 4.2 branch fixes#44

Draft
giggsoff wants to merge 64 commits into
develop/4.3.0/4.2.0.1from
hotfix/NGSOK-1746
Draft

Pull 4.2 branch fixes#44
giggsoff wants to merge 64 commits into
develop/4.3.0/4.2.0.1from
hotfix/NGSOK-1746

Conversation

@giggsoff

Copy link
Copy Markdown
Collaborator

Pull changes from 4.2 branch

Huaxin Gao and others added 30 commits June 12, 2026 18:53
…TypeSuite to the Java test source root

### What changes were proposed in this pull request?

Move `JavaGeographyTypeSuite` and `JavaGeometryTypeSuite` from
`sql/core/src/test/scala/org/apache/spark/sql/types/` to
`sql/core/src/test/java/org/apache/spark/sql/types/`.

These are Java (JUnit) test suites that were placed under the Scala test source
root when they were introduced in apache#52623 ([SPARK-53920]). This relocates them to
the standard Java test source root. The package (`org.apache.spark.sql.types`)
is unchanged and there is no logic change.

### Why are the changes needed?

Java test sources belong under `src/test/java`, not `src/test/scala`. Keeping
them in the Scala source tree is inconsistent with the standard Maven/SBT
project layout.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests. This is a pure file relocation with no logic change.

### Was this patch authored or co-authored using generative AI tooling?

Yes, prepared with assistance from an AI coding agent.

Closes apache#56492 from cloud-fan/move-java-geo-typesuites-to-java-root.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit c78c0c8)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… LEFT OUTER

### What changes were proposed in this pull request?

Followup to apache#55682.

In `RewriteNearestByJoin`, when the `NEAREST BY` join type is `LEFT OUTER`, the synthesized `Join` widens the right-side columns to nullable. However, the synthesized `Aggregate` (and the optional `__ranking__` `Project`) built on top of that join still referenced the right-side columns via `right.output` and `rankingExpression` with their original (non-nullable) nullability. As a result the rewritten plan can declare a right-side column as non-nullable while its child -- the join -- produces it as nullable.

This PR maps the right-side attributes to their widened (nullable) form for `LEFT OUTER` and rewrites both the `CreateStruct(right.*)` and the ranking expression to use that widened nullability, so the rewritten plan's schema is consistent with its child. For `INNER` joins the right side is not widened, so this is a no-op.

### Why are the changes needed?

Without this fix the rewritten plan for a `LEFT OUTER NEAREST BY` declares right-side columns non-nullable while its join child produces them nullable -- an inconsistency that nullability/plan-integrity validation flags as a regression.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added a regression test to `RewriteNearestByJoinSuite` that, for both `INNER` and `LEFT OUTER`, asserts every right-side attribute the synthesized `Aggregate` references agrees on nullability with its join child. The test uses **non-nullable** right-side columns so that `LEFT OUTER`'s widening is observable -- it fails without this fix (`x#.. declared nullable=false but its child produces nullable=true`) and passes with it, while `INNER` stays a no-op. The suite's expected-plan helper was also updated to mirror the widened nullability.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#56484 from cloud-fan/SPARK-56395-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 62e4e16)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…auncher under CI load

### What changes were proposed in this pull request?

`SparkLauncherSuite.testInProcessLauncher` waits for the connection between the in-process app and the launcher to be established by polling the app handle state with `eventually(Duration.ofSeconds(5), Duration.ofMillis(10))`. Under heavy CI load this 5-second window is too short: the handle can remain in `UNKNOWN` for longer, causing the test to fail with:

```
java.lang.IllegalStateException: Failed check after 476 tries: expected: not equal but was: <UNKNOWN>.
  at org.apache.spark.launcher.BaseSuite.eventually(BaseSuite.java:88)
  at org.apache.spark.launcher.SparkLauncherSuite.inProcessLauncherTestImpl(SparkLauncherSuite.java:162)
  at org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher(SparkLauncherSuite.java:130)
```

This change increases the timeout to 30 seconds with a 100ms poll interval, consistent with `waitForSparkContextShutdown` (30s/100ms) and the other `eventually` calls in this suite (60s/1000ms).

### Why are the changes needed?

`SparkLauncherSuite.testInProcessLauncher` is flaky under CI load. The change only relaxes a test timeout; it does not change production behavior.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test `SparkLauncherSuite.testInProcessLauncher`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes apache#56529 from dbtsai/dbtsai/fix-flaky-inprocess-launcher.

Authored-by: DB Tsai <dbtsai@dbtsai.com>
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
(cherry picked from commit 1cf08e6)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
### What changes were proposed in this pull request?

Revise `DEBUG` log message in `ThriftHttpServlet`.

### Why are the changes needed?

To be more secure.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs because this is logging-only change.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56491 from dongjoon-hyun/SPARK-57429.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit b6450e6)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5f11a43)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…teArrays.decode`

`Encoders.ByteArrays.decode()` now validates the length prefix before allocating the array via `Objects.checkFromIndexSize(0, length, buf.readableBytes())`, which requires `0 <= length <= buf.readableBytes()` and throws `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secure environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L88-L90

https://github.com/apache/spark/blob/b6450e6765a2fbe307c0b10d744da3ab4583ba44/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthMessage.java#L53-L64

No.

Pass the CIs with newly added test cases in `EncodersSuite`.

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56493 from dongjoon-hyun/SPARK-57430.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5c29a93)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 704bfef)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…rings.decode`

### What changes were proposed in this pull request?

`Encoders.Strings.decode()` now validates the length prefix before allocating the array, requiring `0 <= length <= buf.readableBytes()` and throwing `IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the prefix, so `readableBytes()` equals the remaining payload bytes.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use **Netty's `checkReadableBytes`**. However, both methods throws `IndexOutOfBoundsException` identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new byte[length]` from an untrusted length.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

Note that `OutOfMemoryError` can happen even in a secured environment at the first parse on an unauthenticated channel.

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java#L72-L80

https://github.com/apache/spark/blob/67eafbddf7962185b2399b97241a008cf8d0d333/common/network-common/src/main/java/org/apache/spark/network/sasl/SaslMessage.java#L68-L74

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

Without this PR, `testStringsDecodeShouldFailWhenLengthExceedsReadableBytes` fails with `OutOfMemoryError`.
```
[info] Test org.apache.spark.network.protocol.EncodersSuite#testStringsDecodeShouldFailWhenLengthExceedsReadableBytes() started
[0.473s][warning][oom,vendor] java.lang.OutOfMemoryError occurred: Requested array size exceeds VM limit
[error] java.lang.OutOfMemoryError: Requested array size exceeds VM limit

```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56488 from dongjoon-hyun/SPARK-57426.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 768569d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a90d3a9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…nt|Long)Arrays` decode

### What changes were proposed in this pull request?

`Encoders.IntArrays.decode()` and `Encoders.LongArrays.decode()` now validate the
element-count prefix before allocating the array.

Unlike the 1-byte-element `Encoders.Strings` / `Encoders.ByteArrays` cases
(SPARK-57426, SPARK-57430), where the byte length maps 1:1 to readable bytes, each
element here occupies a fixed multiple of bytes, so the upper bound is divided by the
element size:

- `IntArrays`: `Objects.checkFromIndexSize(0, numInts, buf.readableBytes() / 4)` (4 bytes per `int`)
- `LongArrays`: `Objects.checkFromIndexSize(0, numLongs, buf.readableBytes() / 8)` (8 bytes per `long`)

This requires `0 <= count <= (remaining bytes / element size)` and throws
`IndexOutOfBoundsException` otherwise. After `readInt()`, the reader index is past the
count prefix, so `buf.readableBytes() / elementSize` is the maximum number of elements
actually decodable from the remaining payload.

- Note that we use the Java built-in `Objects.checkFromIndexSize` because we cannot use
  **Netty's `checkReadableBytes`**. However, both methods throw `IndexOutOfBoundsException`
  identically for the error cases.
  - https://netty.io/4.2/api/io/netty/buffer/AbstractByteBuf.html#checkReadableBytes(int)
  - https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Objects.html#checkFromIndexSize(int,int,int)

### Why are the changes needed?

`decode()` allocated `new int[numInts]` / `new long[numLongs]` from an untrusted count prefix.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error, consistent with SPARK-57426 (`Strings`)
and SPARK-57430 (`ByteArrays`).

These two decoders are used by the shuffle wire-protocol messages, e.g.
`FetchShuffleBlocks`, `FetchShuffleBlockChunks`, `MergeStatuses`, and `LocalDirsForExecutors`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56500 from dongjoon-hyun/SPARK-57439.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6330518)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5087cbe)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ma` comment queries

### What changes were proposed in this pull request?

This PR applies `escapeSql` to the `comment` argument of the JDBC dialect comment query
methods: `JdbcDialect.getTableCommentQuery`, `JdbcDialect.getSchemaCommentQuery`, and the
`MySQLDialect.getTableCommentQuery` override.

### Why are the changes needed?

The `comment` was embedded into a single-quoted SQL literal without escaping, so a comment
containing `'` breaks the statement. `escapeSql` doubles single quotes (`'` -> `''`):

- Before: `COMMENT ON TABLE t IS 'a'b'` (invalid)
- After: `COMMENT ON TABLE t IS 'a''b'` (correct)

This is consistent with the existing string value treatment.

https://github.com/apache/spark/blob/875982c4a65bc703bce88fb17683b423d8f88255/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L384-L385

### Does this PR introduce _any_ user-facing change?

Yes. A table or schema comment containing a single quote is now stored correctly instead
of failing with a SQL syntax error.

### How was this patch tested?

Pass the CIs with a newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56506 from dongjoon-hyun/SPARK-57446.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 072994d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 012a3d8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ct.indexExists`

### What changes were proposed in this pull request?

`indexExists` in `H2Dialect`, `MySQLDialect`, and `PostgresDialect` embeds the
index name as a SQL string literal in its lookup query. This PR wraps the index
name with the existing `JdbcDialect.escapeSql` helper so that single quotes are
properly escaped.

Note that the scope of this PR is only index names. For table names or TABLE_SCHEMA, we are going to handle independently.

### Why are the changes needed?

If an index name contains a single quote, the lookup query becomes malformed
(e.g. `... INDEX_NAME = 'a'b'`). `JdbcUtils.checkIfIndexExists` swallows the
resulting exception and returns `false`, so an existing index whose name
contains a single quote is incorrectly reported as not existing.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes apache#56508 from dongjoon-hyun/SPARK-57447.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 86db727)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ab289d6)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…cret` with Master REST server

### What changes were proposed in this pull request?

This PR removes the standalone `Master` check-code that rejects `spark.authenticate.secret` when the Master REST server (`spark.master.rest.enabled`) is enabled.

https://github.com/apache/spark/blob/088071d869dee0cb433c5e72ba2e7851e332b391/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L138-L144

For the record, the check was introduced at Apache Spark 2.4.0. And, currently, it's outdated.
- apache#22071

### Why are the changes needed?

`spark.authenticate.secret` (the RPC authentication secret) and `spark.master.rest.enabled` (the standalone submission REST server) are independent concerns, but the removed check-code coupled them by failing Master startup whenever both were set.

Since Apache Spark 4.1.0, `spark.master.rest.enabled` defaults to `true`, this check-code forced any cluster using RPC authentication to disable the REST server. This is wrong. We don't need to block like this because the REST server is protected independently like the following.
- apache#47595 (Apache Spark 4.0.0)
- apache#47596
- apache#49894 (Apache Spark 4.1.0)

### Does this PR introduce _any_ user-facing change?

Previously, starting a standalone Master with `spark.authenticate.secret` set and `spark.master.rest.enabled=true` (the default) failed with an `IllegalArgumentException`. After this PR, the Master starts normally with both configured securely.

Although this is a bug fix by enabling a previous-blocked code path. So, technically there is no loss from the user perspective.

### How was this patch tested?

Added a unit test in `MasterSuite` that verifies a `Master` can be created with both `spark.master.rest.enabled=true` and `spark.authenticate.secret` set.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56511 from dongjoon-hyun/SPARK-57451.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ff36aac)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 0a95243)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

This PR strengthens the secret used by `CookieSigner` for HTTP cookie-based authentication in `ThriftHttpServlet`.

Previously, the secret was derived from a single `SecureRandom.nextLong()` (64 bits of entropy), converted to a decimal `String`, and then turned back into bytes via `String.getBytes()`. This PR instead generates 32 cryptographically secure random bytes (256 bits) and passes them directly to `CookieSigner`, which already accepts a `byte[]`.

Before:
```java
String secret = Long.toString(RAN.nextLong());
this.signer = new CookieSigner(secret.getBytes());
```

After:
```java
byte[] secret = new byte[32];
RAN.nextBytes(secret);
this.signer = new CookieSigner(secret);
```

### Why are the changes needed?

- The previous secret carried only **64 bits of entropy** and was encoded as a decimal string, which is weak key material for the SHA-512-based `CookieSigner`.
- The `long -> String -> bytes` round-trip relied on `String.getBytes()` with the platform default charset, an unnecessary and platform-dependent step.
- Using 256 bits of secure-random material removes both concerns. `CookieSigner` consumes the secret as raw bytes, so no string encoding is needed.

### Does this PR introduce _any_ user-facing change?

No. The cookie-signing secret is generated internally per server instance and is never persisted or exposed. The cookie format, signing algorithm, and authentication flow are unchanged.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56552 from dongjoon-hyun/SPARK-57498.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 065b62e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ringArrays.decode`

### What changes were proposed in this pull request?

`Encoders.StringArrays.decode()` now validates the count prefix before allocating the array via `Objects.checkFromIndexSize(0, numStrings, buf.readableBytes() / 4)`, which requires `0 <= numStrings <= buf.readableBytes() / 4` and throws `IndexOutOfBoundsException` otherwise. The divisor `4` is the minimum on-wire size of one element, since `Strings.encode` always writes a 4-byte length prefix like the following.

https://github.com/apache/spark/blob/118026dc9b9693613aab96ae6f4a3c126c6c3fdf/common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java#L37-L41

### Why are the changes needed?

`decode()` allocated `new String[numStrings]` from an untrusted count.
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

Validating first fails fast with a clear error.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56521 from dongjoon-hyun/SPARK-57474.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 0caf790)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 727db8f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…tmapArrays.decode`

### What changes were proposed in this pull request?

`Encoders.BitmapArrays.decode()` now validates the count prefix before allocating the array via `Objects.checkFromIndexSize(0, numBitmaps, buf.readableBytes() / 8)`, which requires `0 <= numBitmaps <= buf.readableBytes() / 8` and throws `IndexOutOfBoundsException` otherwise. The divisor `8` is the minimum on-wire size of one element, since an empty `RoaringBitmap` serializes to 8 bytes (a 4-byte cookie followed by a 4-byte size).

### Why are the changes needed?

To make it robust.

Previously, `decode()` allocated `new RoaringBitmap[numBitmaps]` without checking. So,
- A negative value throws an opaque `NegativeArraySizeException`.
- An oversized value can trigger `OutOfMemoryError` on a corrupt or hostile frame.

The `numBitmaps` count comes from a `MergeStatuses` reply that the driver decodes in
`ExternalBlockStoreClient.finalizeShuffleMerge`. The server that produces this reply is
pluggable via `spark.shuffle.push.server.mergedShuffleFileManagerImpl`, so the frame may be
encoded by a third-party implementation rather than Spark's own `Encoders`. Since the driver
decodes bytes it did not encode, a version mismatch, an implementation bug, or a truncated
frame can reach this allocation. Validating first fails fast with a clear error, consistent
with the other array decoders in `Encoders`.

https://github.com/apache/spark/blob/0caf79030573233c6db526a4289499f8756adeaf/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L545-L554

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test cases.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56523 from dongjoon-hyun/SPARK-57475.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2f4d32a)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit e9286c5)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

Update the commit SHAs of the following Docker-related GitHub Actions in `branch-4.2` to match the ones registered in the Apache organization's GitHub Actions allowlist:

- `docker/login-action`
- `docker/setup-qemu-action`
- `docker/setup-buildx-action`
- `docker/build-push-action`

### Why are the changes needed?

The `master` branch was already updated to the new SHAs, but `branch-4.2` still had the old ones that are no longer in the allowlist.

- apache#55687

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should pass with this change.

### Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Claude

Closes apache#56555 from sarutak/update-docker-actions-branch-4.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Kousuke Saruta <sarutak@apache.org>
…cessorCount.enabled` to yarn module

Port apache#56541 to branch-4.2

### What changes were proposed in this pull request?

Move definition of config `spark.yarn.am.limitActiveProcessorCount.enabled` from the core to yarn module

### Why are the changes needed?

The YARN AM config should exist on the yarn module, not core.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GHA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#56563 from pan3793/SPARK-53209-followup-4.2.

Authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…erated Javadoc

### What changes were proposed in this pull request?

Spark publishes both a Scaladoc and a Javadoc API site. The Javadoc is generated from Scala sources by genjavadoc, and it currently exposes a large number of internal types that the Scaladoc correctly hides.

The root cause: a top-level `private[x]` Scala type (e.g. `private[spark] trait SupportsDelegationToken`) compiles to a JVM-`public` symbol. genjavadoc emits a `public` Java stub for it even with `-P:genjavadoc:strictVisibility=true`, and the Javadoc `-public` option can't filter it because the stub genuinely is public. Scaladoc, by contrast, honors the access qualifier and drops these types.

This PR adds a filter to `JavaUnidoc / unidoc / unidocAllSources` (alongside the existing `ignoreUndocumentedPackages`) that drops a generated stub `<module>/target/java/<pkg>/<Name>.java` **iff every top-level Scala declaration of `<Name>` in that package is `private[...]`**. A public class with a `private[...]` companion object (e.g. `SparkConf` — public `class`, `private[spark] object`) is kept, since the class itself is public.

### Why are the changes needed?

The published Javadoc lists ~1.3k internal types (e.g. `BarrierCoordinator`, `ContextCleaner`, `ExecutorAllocationManager`, scheduler RPC messages, `SupportsDelegationToken`) that are `private[spark]` in source and are absent from the Scaladoc. This both misleads Java users about the public API surface and makes the two API docs disagree on which types are public. Filtering them aligns the Java API doc with the Scala one (format still differs, coverage now matches) without touching genuinely Java-authored public APIs.

### Does this PR introduce _any_ user-facing change?

No code/runtime change. The only user-facing effect is on the generated Javadoc site: top-level `private[spark]` (and other qualified-private) Scala types no longer appear as public Java classes. Genuinely public APIs — including Java-authored ones (`src/main/java`, e.g. the DataSource V2 connector interfaces) and Java-friendly wrappers like `org.apache.spark.api.java.JavaRDD` — are unaffected.

### How was this patch tested?

- Validated the filter selects exactly the package-private stubs against the already-generated `*/target/java` stubs across `core`, `sql/core`, `sql/api`, `sql/catalyst`, `mllib`, `streaming`: it drops the `private[spark]` leaks (`SupportsDelegationToken`, `StructuredStreamingIdAwareSchedulerLogging`, `InternalAccumulator`, ~1.3k total) while keeping public types and public-class-with-private-companion cases (`SparkConf`, `SparkContext`, `TaskContext`, `RDD`).
- Confirmed the build definition compiles via `build/sbt reload`.
- A full `build/sbt unidoc` run is the end-to-end integration check; relying on CI's docs build for that.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Isaac)

This pull request and its description were written by Isaac.

Closes apache#56538 from cloud-fan/genjavadoc-exclude-package-private.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 5680d15)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?

An audit of Spark 4.2 commits (SPARK-57452) found a number of user-facing behavior changes that were not documented in the migration guides. This PR adds entries for the subset of those changes that **require a user to take action to migrate**, across three guides:

PySpark (`python/docs/source/migration_guide/pyspark_upgrade.rst`):
- PyPy is no longer officially supported.
- Minimum pandas version for Spark Connect raised to 2.2.0.
- `createDataFrame` from a NumPy `ndarray` now requires PyArrow and may infer a different schema.
- Pandas UDFs receive nullable integer columns as a pandas nullable `Int` dtype instead of `float64`.
- `Observation.get` raises on metric collection failure instead of returning an empty dictionary.
- `DataFrame.drop` / `Series.drop` (pandas API on Spark) raise `KeyError` if any label is missing.
- Python Data Sources fail with `DATA_SOURCE_RETURN_SCHEMA_MISMATCH` on schema mismatch.
- Python streaming data sources fail when reported offsets do not advance.

SQL (`docs/sql-migration-guide.md`):
- Duplicate CTE names within a `WITH` clause are detected case-insensitively.
- `NATURAL JOIN` honors `spark.sql.caseSensitive` when choosing join columns.
- Built-in parameterless functions take precedence over same-named UDFs.
- `SET CATALOG <name>` resolves a bare name as a session variable first.

Structured Streaming (`docs/streaming/ss-migration-guide.md`):
- Restarting from a checkpoint with missing required metadata now fails by default.

Behavior changes that are purely cosmetic/UI, internal, additive, or correctness fixes requiring no user action were intentionally excluded.

### Why are the changes needed?

Without these entries, users upgrading to Spark 4.2 can hit silent result changes, new errors on previously-valid input, or dependency/environment requirements with no documented guidance on how to migrate.

### Does this PR introduce _any_ user-facing change?

No. This is a documentation-only change.

### How was this patch tested?

Documentation-only change; verified by review of the rendered Markdown/reStructuredText.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Closes apache#56553 from cloud-fan/SPARK-57452-migration-guide.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 11e4a42)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…` for disallowed `/workers/kill` requests

### What changes were proposed in this pull request?

When the Master Web UI `/workers/kill` endpoint rejects a decommission request, return `403 FORBIDDEN` instead of `405 METHOD_NOT_ALLOWED`.

```scala
 if (!isDecommissioningRequestAllowed(req)) {
-  resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED)
+  resp.sendError(HttpServletResponse.SC_FORBIDDEN)
 }
```

This is a subtask of SPARK-55556 `Improve Web Security`.

### Why are the changes needed?

The original feature was introduced at Apache Spark 3.1.0.
- apache#29015

The request is rejected by policy/origin (`spark.master.ui.decommission.allow.mode` is `DENY`, or `LOCAL` with a non-local caller), not because of the HTTP method. `405` means the method is unsupported and must carry an `Allow` header; `403` ("understood the request but refuses to authorize it") is the correct code, consistent with `HttpSecurityFilter` and `JWSFilter` in the same module.

### Does this PR introduce _any_ user-facing change?

Yes but only a different HTTP ErrorCode is returned because a disallowed `/workers/kill` request now returns `403` instead of `405`. The functional behavior is the same rejection. So, no functional behavior change.

### How was this patch tested?

Pass the CIs with the newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56573 from dongjoon-hyun/SPARK-57509.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ec83fe6)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…orker packages out of the published API

### What changes were proposed in this pull request?

Backport of apache#56551 to `branch-4.2`. Two related changes that keep internal packages out of the published 4.2.0 API surface:

1. Move the client-side Types Framework ops -- `TypeApiOps` and `TimeTypeApiOps` -- from `org.apache.spark.sql.types.ops` to `org.apache.spark.sql.catalyst.types.ops`, co-located with the server-side `TypeOps` family. Consumer imports are updated; same-package consumers drop the now-redundant import. (The `TimestampNanos*ApiOps` types moved in the master PR do not exist on `branch-4.2`, so they are not part of this backport.)
2. Exclude `org.apache.spark.udf.worker` from the generated API docs in `project/SparkBuild.scala`'s `ignoreUndocumentedPackages`.

### Why are the changes needed?

The `*ApiOps` types are internal plumbing of the Types Framework (the client-side counterpart to catalyst's `TypeOps`), but they lived inside the public `org.apache.spark.sql.types` package, so they leaked into the published PySpark/Scala API of the unreleased 4.2.0 line. `org.apache.spark.sql.catalyst.*` is already excluded from both the generated docs (`ignoreUndocumentedPackages`) and MiMa (`MimaExcludes`), so relocating them there makes them internal with no new build/MiMa entries and mirrors how the server-side `TypeOps` is already handled.

`org.apache.spark.udf.worker` is UDF-worker infrastructure (mostly protobuf-generated `*OrBuilder` Java plus worker internals) that surfaced as public API. Its modules aren't MiMa-checked, and the generated Java can't carry a Scala visibility qualifier, so excluding the package from the docs is the appropriate fix.

### Does this PR introduce _any_ user-facing change?

No. Relative to released Spark there is no change; the affected types are new in the unreleased 4.2.0 line and were never intended to be public. This only removes them from the generated API docs (and, for the ops, the binary-compatibility surface) before release. There is no behavior change.

### How was this patch tested?

No new tests -- this is a package relocation plus a build-config change with no logic change. The relocated classes are exercised by existing suites and the cast / `Row` / `HiveResult` paths; CI compiles all affected modules and runs scalastyle, which enforces the import-ordering updates made here.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Closes apache#56571 from cloud-fan/SPARK-57496-4.2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…point in MasterWebUI

### What changes were proposed in this pull request?

This PR makes the Master Web UI's `/workers/kill` endpoint check the modify ACL, like the other kill actions (`MasterPage`, `JobsTab`, `StagesTab`). For example, Apache Spark Master UI ignores the request when ACL is denied.

https://github.com/apache/spark/blob/28c232f295edf06836667121ad859a730f8163d4/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala#L65-L67

This is a subtask of SPARK-55556 `Improve Web Security`.

### Why are the changes needed?

`/workers/kill` was the only destructive Master UI action that did not honor the modify ACL; it was guarded only by `spark.master.ui.decommission.allow.mode` (IP-based). So with ACLs enabled, a user without the modify permission could still decommission workers.

### Does this PR introduce _any_ user-facing change?

Yes. With `spark.acls.enable=true`, a `/workers/kill` request from a user not in `spark.modify.acls` is now ignored. No change with the default config.

### How was this patch tested?

Pass the CIs with newly added `MasterWebUIAclSuite`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56574 from dongjoon-hyun/SPARK-57510.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6b3a569)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

Do not test `labels` anymore because it's never officially documented for pandas on pyspark. `matplotlib` dropped the support for it since 3.11.0 (deprecated even earlier). pandas does not support it either.

### Why are the changes needed?

CIs are failing https://github.com/apache/spark/actions/runs/27482315740/job/81233169249 .

Notice that this is actually `pandas` failure - it's not even pyspark.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#56524 from gaogaotiantian/fix-matplotlib-labels.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 42bc26d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ee635ad)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Today, when `PushVariantIntoScan` rewrites a strict variant cast/`variant_get` into a typed scan field, the cast is evaluated eagerly inside the scan. An `INVALID_VARIANT_CAST` from any row aborts the query, even when the user expression that requested the cast (e.g., a predicate that prunes the bad row) would never actually consume it.

This PR adds an opt-in (`spark.sql.variant.pushVariantIntoScan.deferCastError`, default off) that defers the cast error to the row's consumer. The mechanism:

- **Wrapper schema** — For each pushed strict-cast field `<n>`, add a new field with a special metadata entry `castErrorFor: <n>` to the variant struct schema. This field name will be use for paring the target field and its cast-error companion.
- **Reader** — `SparkShreddingUtils.assembleVariantStruct` catches `INVALID_VARIANT_CAST`, writes the offending value into `cast_error`, and leaves `field_value` null on failure (and the reverse on success).
- **Consumer** — New Catalyst expression `UnwrapVariantCastError(cast_error, field_value)` is equivalent to `if(cast_error IS NOT NULL, raise_error('INVALID_VARIANT_CAST', ...), field_value)` but kept as a single named expression so downstream operators (physical sacn) can easily recognize it)

To ensure that user doesn't get surprising result when `PushVariantIntoScan` is enabled.

No.

New unit tests.

Yes. Co-authored with Claude Opus 4.8.

Closes apache#56505 from chenhao-db/fix_variant_push.

Authored-by: chenhao-db <chenhao.li@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 1284cfc)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ngs to Oracle format

### What changes were proposed in this pull request?
Fix the Oracle TRUNC pushdown to correctly map Spark's date truncation format strings to their Oracle equivalents, instead of hardcoding `'IW'` (ISO week) for all levels.

### Why are the changes needed?
`OracleDialect.visitSQLFunction` hardcodes `'IW'` for all TRUNC pushdowns regardless of the requested truncation level. When a user writes `trunc(col, 'MONTH')`, `trunc(col, 'YEAR')`, or `trunc(col, 'QUARTER')`, the generated SQL sent to Oracle is always `TRUNC(col, 'IW')` - silently returning week-truncated dates instead of the correct truncation. Introduced in SPARK-51585.

### Does this PR introduce _any_ user-facing change?
Yes. `trunc()` predicates pushed down to Oracle V2 data sources now produce correct results for MONTH, YEAR, and QUARTER truncation levels (previously all returned week-truncated dates).

### How was this patch tested?
Added new unit test in `JDBCSuite` that constructs TRUNC V2 expressions for each format level (WEEK, MONTH, YEAR, QUARTER), compiles them via `OracleDialect.compileExpression`, and asserts the correct Oracle format string appears in the output. The test fails without the fix (produces `'IW'` for all levels) and passes with the fix.

### Was this patch authored or co-authored using generative AI tooling?
Yes. Authored using Claude-Opus 4.6

Closes apache#56425 from shrirangmhalgi/SPARK-57364-oracle-trunc-format.

Authored-by: Shrirang Mhalgi <shrirangmhalgi@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 80ac691)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ssion` when converting an expression-backed `Column`

### What changes were proposed in this pull request?

`ColumnNodeToExpressionConverter` (sql/core .../classic/columnNodeSupport.scala) converts an expression-backed `Column` (an `ExpressionColumnNode`) into a Catalyst `Expression` and, if the wrapped expression is an `AggregateFunction`, wraps it in an `AggregateExpression` via `toAggregateExpression()`. Because `AggregateWindowFunction extends DeclarativeAggregate (which extends AggregateFunction) with WindowFunction`, a window function also matched this branch and got wrapped, producing `WindowExpression(AggregateExpression(windowFunc), spec)` instead of `WindowExpression(windowFunc, spec)`. This PR adds a guard so that an `AggregateFunction` that is also a `WindowFunction` is left untouched and used directly as the child of the `WindowExpression`. Regular aggregate functions used as window functions (e.g. `sum`) are unaffected because wrapping them in an `AggregateExpression` is correct.

### Why are the changes needed?

Wrapping a window function in an `AggregateExpression` makes analysis fail: `CheckAnalysis` sees an `AggregateExpression` whose child is a `WindowFunction` but which is not itself a `WindowExpression`, and throws `WINDOW_FUNCTION_WITHOUT_OVER_CLAUSE`. As a result, a user-defined `AggregateWindowFunction` (or any built-in window function expression) wrapped into a `Column` through the `ClassicConversions` / `ColumnConversions` DeveloperApi and used with `over(...)` could not be analyzed. This blocks extension developers from plugging custom window functions in via the Column API, a pattern that worked before the Column API was decoupled from Catalyst.

### Does this PR introduce _any_ user-facing change?

Yes. Before this change, wrapping a window function expression into a `Column` and calling `.over(window)` failed analysis with `WINDOW_FUNCTION_WITHOUT_OVER_CLAUSE`; now it analyzes and executes correctly, matching the behavior of the equivalent by-name window function. This is a fix within the unreleased master branch relative to the existing Column/Catalyst conversion behavior.

### How was this patch tested?

Added two regression tests to `DataFrameWindowFunctionsSuite`: one wraps the built-in `RowNumber()` expression into a `Column` and checks it matches by-name `row_number()`, and one defines a minimal custom `AggregateWindowFunction` (`NonNullRunningCount`) and checks it produces correct results when used with `over(...)`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes apache#56570 from peter-toth/SPARK-57505-fix-custom-aggregatewindowfunction-analysis.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Peter Toth <peter.toth@gmail.com>
(cherry picked from commit 201da65)
Signed-off-by: Peter Toth <peter.toth@gmail.com>
…Factory

### What changes were proposed in this pull request?
1. `OrcPartitionReaderFactory.createORCReader`: wrap the body after `OrcFile.createReader` in a try/catch; on any exception, close the reader before re-throwing.
2. `OrcPartitionReaderFactory.buildColumnarReader`: remove the `lazy` qualifier from `val (reader, readerOptions)`, which was meaningless because `reader` is accessed on the very next line.

### Why are the changes needed?
Reader may not be closed properly when an exception is thrown.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added `OrcPartitionReaderFactorySuite` with a test that reproduces the leak scenario.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code

Closes apache#56589 from cxzl25/SPARK-57529.

Authored-by: cxzl25 <sychen@ctrip.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 37b2ac2)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ation header

The Scala/JVM Spark Connect client sends the access token (from the `token` connection param) in a non-standard `Authentication` gRPC metadata header. This changes it to the standard `Authorization`, and adds a unit test.

```scala
- Metadata.Key.of("Authentication", Metadata.ASCII_STRING_MARSHALLER)
+ Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER)
```

Bearer tokens belong in `Authorization` (RFC 6750). The old key breaks token auth and is inconsistent with every other consumer:
- server-side `PreSharedKeyAuthenticationInterceptor` reads the token from `Authorization`;

https://github.com/apache/spark/blob/9d0b440554fe90d1b58c712c9ad1fe3c10bde6ec/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/PreSharedKeyAuthenticationInterceptor.scala#L28

- the local-channel path in the same client already uses `Authorization`;
- the Python client uses gRPC's `access_token_call_credentials` (i.e. `Authorization`);
- standard JWT proxies (e.g. Envoy/Istio) extract the bearer from `Authorization` and otherwise reject the request.

Yes. The Scala/JVM client now sends the token in `Authorization` instead of `Authentication`, fixing token auth against the pre-shared-key interceptor and standard JWT proxies.

New unit test in `SparkConnectClientSuite` asserting `AccessTokenCallCredentials` emits `Authorization: Bearer <token>` (RED before the fix, GREEN after); full suite passes.

Generated-by: Claude Code (Opus 4.8)

Closes apache#56389 from j1wonpark/SPARK-57336.

Authored-by: Jiwon Park <jpark92@outlook.kr>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4351d46)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

In `YarnShuffleService.initializeApplication`, when the application data fails to parse,
the warning logged the entire `payload`. This PR logs only the application ID instead.

### Why are the changes needed?

With `spark.authenticate` enabled, the default code path sends the raw shuffle secret as
the `payload`, so the warning logged the secret in plaintext to the NodeManager logs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Yes. Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56599 from dongjoon-hyun/SPARK-57538.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a0f9088)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

Serialize the Scala runtime-reflection calls in `ArrowDeserializers` so they are safe under concurrent Arrow deserialization.

While building deserializers, `ArrowDeserializers` uses runtime reflection in two places:
- resolving a Scala collection's **companion object** (`mirror.classSymbol(cls).companion.asModule`) to construct the collection's builder;
- resolving a Scala **`Enumeration`'s module instance** (`mirror.classSymbol(parent).module.asModule`) to deserialize enum values by name.

Both now go through a `synchronized` seam on the `ArrowDeserializers` object monitor, with a consistent two-method shape:
- `resolveCompanion(tag)` -> `resolveCompanionFromMirror(mirror, cls)` *(synchronized)*
- `resolveEnum(parent)` -> `resolveEnumFromMirror(mirror, parent)` *(synchronized)*

The thin outer methods read `currentMirror`; the inner `*FromMirror` methods take the mirror as a parameter and hold the lock around the reflection. Taking the mirror as a parameter lets the regression test drive these exact synchronized methods against a deliberately cold mirror, where the race reproduces. Both methods lock the same monitor, so the two reflection paths also serialize against each other.

### Why are the changes needed?

Scala runtime reflection is not thread-safe (scala/bug#6240). When several Arrow result batches are deserialized concurrently against a still-cold reflection symbol table, `classSymbol(...).companion` / `.module` can observe the symbol as `NoSymbol`, so `.asModule` fails with:

```
scala.ScalaReflectionException: <none> is not a module
```

The window is normally narrow -- a mirror warms up after the first resolution of each symbol -- but it is reachable in practice when multiple deserializers resolve collection/enum types at the same time. Serializing the reflection through one monitor closes it.

### Does this PR introduce _any_ user-facing change?

No. This is an internal robustness fix; the result of a successful deserialization is unchanged.

### How was this patch tested?

New `ArrowDeserializersConcurrencySuite` reproduces the race with high probability: each repetition deterministically re-opens the race window by building a runtime mirror over a fresh, cold `URLClassLoader` (parented at the platform loader so `scala.*` is reloaded cold), then drives the real synchronized reflection from 16 threads x 50 repetitions, covering both `resolveCompanionFromMirror` (~40 collection companions) and `resolveEnumFromMirror` (`Enumeration` fixtures). Both tests pass.

Validated end-to-end by toggling the fix:
- **With the fix** -- the concurrency suite is green.
- **Without the fix** (`synchronized` removed) -- the suite goes red with `scala.ScalaReflectionException: <none> is not a module`, the production symptom.
- **No functional regression** -- `ArrowEncoderSuite` stays green.

### Was this patch authored or co-authored using generative AI tooling?
YES

Closes apache#56435 from haiyangsun-db/SPARK-57371.

Authored-by: Haiyang Sun <haiyang.sun@databricks.com>
Signed-off-by: Herman van Hövell <herman@databricks.com>
(cherry picked from commit fff25e5)
Signed-off-by: Herman van Hövell <herman@databricks.com>
nchammas and others added 29 commits June 21, 2026 14:14
### What changes were proposed in this pull request?

This PR rewords the docstring for `NormalizeFloatingNumbers` to eliminate repeated text and add some useful references. I also tweaked the formatting of the numbered lists so they are consistent.

This is a follow-up to apache#53695.

### Why are the changes needed?

These changes make it a bit easier for the future reader understand the need for and context around this optimizer rule.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#56146 from nchammas/normalize-floats-docstring.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 7cc020a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…s both -0.0 and 0.0

### What changes were proposed in this pull request?
This PR fixes `mode()` (and `pandas_mode()`) returning an incorrect result when the input contains both -0.0 and 0.0. Mode/PandasMode accumulate value frequencies in an `OpenHashMap[AnyRef, Long]` keyed by the raw boxed input value. For FLOAT/DOUBLE, `java.lang.Double.equals/hashCode` are defined via `doubleToLongBits`, which distinguishes -0.0 from 0.0. As a result the frequency of a single SQL value is split across two buffer entries, and eval's `maxBy(_._2)` can then pick a value whose true frequency is strictly smaller.

  The fix normalizes the floating-point component of the buffer key at update time, reusing the existing NormalizeFloatingNumbers helpers:
  - top-level DOUBLE/FLOAT → DOUBLE_NORMALIZER / FLOAT_NORMALIZER;
  - complex types containing float/double  → an UnsafeProjection built from NormalizeFloatingNumbers.normalize;
  - all other types → unchanged (InternalRow.copyValue).

### Why are the changes needed?
It is a correctness bug. Spark treats -0.0 = 0.0 under SQL/GROUP BY semantics everywhere else (e.g. GROUP BY, join keys, array_distinct, collect_set), so mode() must do the same. Today it does not:
```SQL
  SELECT mode(c) FROM VALUES (0.0D),(0.0D),(-0.0D),(-0.0D),(9.0D),(9.0D),(9.0D) AS t(c);
  -- returns 9.0, but the correct mode is 0.0
```

### Does this PR introduce _any_ user-facing change?
Yes, a bug fix.

### How was this patch tested?
New test case added

### Was this patch authored or co-authored using generative AI tooling?
Yes. Claude Code

Closes apache#56382 from jiwen624/mode-float-zero-dedup.

Authored-by: Eric Yang <jiwen624@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e20fbb1)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…to prevent table path loss

### What changes were proposed in this pull request?
1. Move `committer.setupJob(job)` from inside `writeAndCommit()` to before `materializeAdaptiveSparkPlan()` in `FileFormatWriter.write()`, so the output path is recreated before anything can throw.
2. Wrap the post-`setupJob` body in `try { ... } catch { committer.abortJob(job); throw }` so the staging dir is cleaned up on any failure (e.g., AQE shuffle stage failure in `materializeAdaptiveSparkPlan)`.
3. Remove `writeAndCommit`'s inner `try / catch + abortJob` since the outer catch now handles it - avoiding double-calling of `abortJob` for write / commit failures

### Why are the changes needed?
`INSERT OVERWRITE` deletes the output path before calling `write()`. When `materializeAdaptiveSparkPlan` throws (AQE shuffle stage failure), `writeAndCommit` is never reached, so `setupJob` never recreates the path. The table path is permanently lost. The outer `try / catch` ensures `abortJob` cleans up the `staging dir (_temporary / .spark-staging-*)` on any failure after setupJob.

### Does this PR introduce _any_ user-facing change?
Yes. Previously, a failed `INSERT OVERWRITE` with AQE could permanently delete the table path. Now the path survives the failure.

### How was this patch tested?
Added regression test in `InsertSuite` that uses a failing UDF in a shuffle stage to trigger AQE failure during `materializeAdaptiveSparkPlan`. Verifies the table path exists after the failed overwrite.

### Was this patch authored or co-authored using generative AI tooling?
Yes. Authored using Claude Opus 4.6.

Closes apache#56126 from shrirangmhalgi/SPARK-56919-setupJob-before-materialize.

Authored-by: Shrirang Mhalgi <shrirangmhalgi@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 802b0b0)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request?

This PR adds a new config `spark.ui.contentSecurityPolicy.enabled` (default `false`) that gates the new feature, `Content-Security-Policy` response header introduced in SPARK-55252 and SPARK-55522. The header is now only set when the config is enabled.

### Why are the changes needed?

To make the Spark UI `Content-Security-Policy` header opt-in, disabling it by default in Apache Spark 4.2.0.

After testing this feature in Apache Spark 4.2.0, we can enable it at 4.3.0.

### Does this PR introduce _any_ user-facing change?

No behavior change because Spark UI no longer sends the `Content-Security-Policy` header at Apache Spark 4.2.0.  Previously, it was a breaking change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56642 from dongjoon-hyun/dongjoon/eloquent-yonath-4682a1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 85fdee9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit adaaa9d)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ading past the end of a truncated trailing UTF-8 sequence

### What changes were proposed in this pull request?

`UTF8String.codePointFrom` decodes a code point by reading `numBytesForFirstByte(leader)` continuation bytes, and `copyUTF8String` copies `end - start + 1` bytes. Neither bounds the read by the bytes that actually remain, so when a string ends in a truncated multi-byte sequence (a leader byte whose declared width exceeds the remaining bytes), both read past the end of the backing memory. `trimLeft`/`trimRight` build their search character through `copyUTF8String`, so they over-read too.

This PR:
- `codePointFrom` reads continuation bytes through a small `continuationByte` helper that returns 0 once the index passes the end of the string.
- `copyUTF8String` clamps the copy length to `numBytes - start`.
- Once `copyUTF8String` stops over-reading, `trimRight` needs a matching accounting fix: it advanced `trimEnd` by the leader's declared width, which overshoots a truncated trailing character, so it now uses the actual (clamped) byte count, as `trimLeft` already does.

### Why are the changes needed?

`UTF8String` can hold malformed UTF-8 (for example, bytes from binary coercion or truncated input). For a string ending in an incomplete multi-byte sequence, these methods read out of bounds and produced wrong results: `codePointFrom` assembled a code point from adjacent memory, and `trimRight` could drop valid leading characters. Well-formed UTF-8 is unaffected, since a complete sequence never exceeds the remaining bytes.

This is a follow-up to SPARK-57507, which fixed the same kind of over-read in `reverse()`.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes incorrect results on malformed input. String operations that reach these methods (such as trimming or code-point access) no longer read past the end of a value that ends in a truncated multi-byte sequence; only previously-incorrect results change. Well-formed strings behave exactly as before.

### How was this patch tested?

Added cases to `UTF8StringSuite`:
- `testCodePointFrom`: truncated trailing 2-, 3-, and 4-byte leaders, including a 4-byte leader with only the last continuation byte missing.
- `copyUTF8StringClampsToRemainingBytes`: an `end` one past the last byte, with a non-zero start so the clamp must use `numBytes - start`.
- `trimTruncatedTrailingSequence`: trimming a truncated trailing leader keeps the valid preceding character.

Each uses a sliced backing array with a trailing sentinel byte, so the previous over-read produces a deterministically wrong value; the cases fail on the old code and pass with the fix. `build/sbt 'unsafe/testOnly *UTF8StringSuite'` passes (51 tests).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56585 from LuciferYang/SPARK-57520-utf8-overread.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 33ae7f6)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
…-correct logical type

### What changes were proposed in this pull request?

This is a backport of apache#56633 (master commit 24e0663) to `branch-4.2`.

It makes Spark's Avro encoding of the `TIME` data type unit-correct. `TimeType` is represented internally as nanoseconds-since-midnight, but the Avro path annotated the column with the `time-micros` logical type while writing the raw nanosecond value, so the declared unit (microseconds) did not match the stored unit (nanoseconds).

The fix converts the value to match the logical type:
- Write path (`AvroSerializer`): `nanos -> micros` (`DateTimeUtils.nanosToMicros`) before writing under `time-micros`.
- Read path (`AvroDeserializer`): `micros -> nanos` (`DateTimeUtils.microsToNanos`) when reading a `time-micros` value into `TimeType`.

`SchemaConverters` is unchanged: `time-micros` is the correct unit-matching logical type for precision 0-6, and the `spark.sql.catalyst.type` property continues to carry precision fidelity for Spark-to-Spark round-trips.

Backport note: the production change applied cleanly. The only cherry-pick conflict was in `AvroSuite.scala`, because the master change places the TIME tests next to the `SPARK-57166: nanosecond timestamp types are not supported in Avro` test, which does not exist on `branch-4.2` (nanosecond timestamp types are a master/`branch-4.x` feature). Resolved by placing the five TIME tests in the base `AvroSuite` (so they run under both `AvroV1Suite` and `AvroV2Suite`) and omitting the unrelated `SPARK-57166` test. The net production/test logic is identical to master.

### Why are the changes needed?

Any external Avro reader (Hive, Trino, Flink, fastavro, etc.) that honors the `time-micros` logical type would decode a Spark-written `TIME` column as microseconds-since-midnight while it actually held nanoseconds-since-midnight - a 1000x error that also falls outside the valid micros-of-day range. The TIME-in-Avro support is present in the 4.2 line (SPARK-54473), so the bug needs to be fixed before 4.2.0 GA.

### Does this PR introduce _any_ user-facing change?

Yes, within the unreleased 4.2.0 line. The on-disk encoding of a `TIME` column written via Avro changes from raw nanoseconds (mislabeled as `time-micros`) to actual microseconds under `time-micros`. Avro files written by earlier unreleased 4.2 builds are intentionally not migrated (accepted break). Spark-to-Spark read/write of `TIME` over Avro continues to round-trip correctly.

### How was this patch tested?

Ran the TIME Avro tests on this `branch-4.2` backport: `AvroV1Suite`/`AvroV2Suite` ("TIME type read/write with Avro format", "TIME type in nested structures in Avro", "TIME type precision metadata is preserved in Avro", "SPARK-57581: TIME is written as unit-correct time-micros for external readers", "SPARK-57581: TIME read from a plain time-micros Avro file (no catalyst prop)") - 10 tests (5 x V1/V2), all pass. `dev/scalastyle` is clean.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

Closes apache#56651 from MaxGekk/fix-time-avro-4.2.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
…er conrrent queries

### What changes were proposed in this pull request?

This change adds `PartitionKeyedAccumulator`, a `ConcurrentHashMap`-backed accumulator keyed by partition id with last-write-wins merge semantics and replaces the counter-based accumulators in `CachedRDDBuilder`(used in `InMemoryRelation`) with the new accumulator. The cached relation now:
  - counts the DISTINCT materialized partition ids (the accumulator key set) when deciding whether the cache is fully loaded, so duplicate computes cannot inflate the count; and
  - derives exact, de-duplicated row-count and size stats by folding the per-partition values, counting each partition once.

The behavior is gated by a new internal conf
`spark.sql.inMemoryColumnarStorage.distinctPartitionTracking` (default true); setting it to false restores the prior raw task-completion-count behavior. `clearCache` resets the bookkeeping so a rebuilt cache starts clean.

### Why are the changes needed?

Fix the bug(introduced here apache#39624, seems like a day-1 bug) where InMemoryRelation will be marked materialized prematurely under conrrent queries:

- AQE creates a separate `TableCacheQueryStageExec` for every reference to the same `df.cache` (never reused), and each one submits its own build job over the *shared cache RDD*.
- When concurrent queries reference the same cached relation, first-touches the cold cache from several jobs at once. Spark has no global, cross-executor "compute this partition once" barrier (only a per-executor write lock), so the same partition can be computed by multiple executors. `CachedRDDBuilder.isCachedRDDLoaded` decided the cache was materialized by comparing the partition count against a *raw task-completion count*. Duplicate completions of an empty-output partition could push that count to the partition total while a row-producing partition was still being built, so the cache latched as "loaded" with `rowCount == 0`.
- One situation that can result in incorrect results: `AQEPropagateEmptyRelation` then ("correctly", given the stats it was told) collapsed the cache branch to an `EmptyRelation` and silently dropping rows.
- Additional latent bugs:
  - size/rows accumulators could be over-counted
  - no accumulators reset upon `clearCache`

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?

- `PartitionKeyedAccumulatorSuite` - accumulator semantics (last-write-wins add/merge, distinct key count, snapshot/reset).
- `ConcurrentInMemoryRelationSuite` - local-cluster reproduction: rows are preserved under concurrent first-touch with the fix on; stats are exact under duplicate cross-executor computes; and a negative control showing the row loss with the fix disabled.
- Extended `CachedTableSuite` (clearCache resets bookkeeping) and `InMemoryColumnarQuerySuite` (size/row-count read through the new accessors).

### Was this patch authored or co-authored using generative AI tooling?
Yes

Closes apache#56620 from liuzqt/SPARK-57547.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ge in TruncTimestamp overflow test on JDK 25

### What changes were proposed in this pull request?

Make `DateExpressionsSuite."TruncTimestamp of Long.MinValue overflows with ArithmeticException"`
tolerant of a null exception message under JDK 25.

### Why are the changes needed?

On the branch-4.2 `Build (branch-4.2, Scala 2.13, JDK 25)` daily build, this test failed. The
overflow from `Math.*Exact` in `TruncTimestamp` does throw a raw `java.lang.ArithmeticException`, but
on JDK 25 the JIT "hot throw" optimization (OmitStackTraceInFastThrow, JDK-8367990) can throw it with a
`null` message in codegen mode, while the interpreter reports `"long overflow"`. The original
assertion required a non-null message, so it failed on JDK 25.

The fix evaluates the expression directly under both `NO_CODEGEN` and `CODEGEN_ONLY` modes, asserts the
exception is an `ArithmeticException`, and tolerates a null-or-"overflow" message (matching the existing
convention used by the SubtractTimestamps / TimestampAddInterval overflow checks in the same suite). The
engine is correct (it does throw on overflow); this is a test-only change.

### Does this PR introduce any user-facing change?

No. Test-only.

### How was this patch tested?

**Validated GREEN on the fork:** the `Build (branch-4.2, Scala 2.13, JDK 25)` build is green with this
fix, including the `api, catalyst, hive-thriftserver` job where `TruncTimestamp` runs:
https://github.com/HyukjinKwon/spark/actions/runs/27942948687

The second commit is a clearly-marked `[DO NOT MERGE]` CI trigger edit for fork validation and should be
dropped before merge.

### Was this patch authored or co-authored using generative AI tooling?

Yes. Generated with Claude Code (Anthropic) under the direction of the repository owner.

Closes apache#56665 from HyukjinKwon/ci-fix/branch42.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <hyukjin.kwon@databricks.com>
This is a follow-up of apache#52729 (which introduced `ViewInfo extends TableInfo`).

`ViewInfo IS-A TableInfo` was never a true relationship -- a view is not a table -- and it
forced table-only state (`partitions()`/`constraints()`, `withPartitions`/`withConstraints`/
`withProvider`/`withLocation`) onto the view surface. It also led `MetadataTable` to do double
duty: a delegating table *and* a carrier that smuggled a view through the `Table` interface.

This PR models a table and a view as the two kinds of a new `Relation` type, mirroring Postgres
`pg_class` + `relkind`:

```
interface Relation     // columns(), properties()
  Table  (existing, now `extends Relation`)  -- an object Spark reads from / writes to
  View   (renamed from ViewInfo)             -- metadata; Spark expands its query text
```

The two kinds are deliberately asymmetric, matching how Spark treats them:

- A view has no object form (Spark never builds one), so its info *is* the relation:
  `ViewInfo` is renamed to `View` and `implements Relation`. `V1ViewInfo` -> `V1View`.
- A table is realized from metadata: `TableInfo` stays metadata-only and is adapted into a
  `Table` by `DelegatingTable` (renamed from `MetadataTable`, now table-only -- it never carries
  a view). A catalog with its own `Table` object returns that instead.
- `RelationInfo` is dropped; the shared builder logic lives in a package-private
  `RelationBuilder` base.

Catalog API:

- `TableViewCatalog` -> `RelationCatalog`, and `loadTableOrView` -> `loadRelation(Identifier):
  Relation`. The default `loadTable` / `loadView` / `tableExists` / `viewExists` discriminate via
  `instanceof Table` / `instanceof View`. `listTableAndViewSummaries` -> `listRelationSummaries`.
- `ViewCatalog.loadView` / `createView` / `replaceView` / `createOrReplaceView` now speak `View`.
- The resolver (`Analyzer`, `RelationResolution`) carries `Relation`, so a pure `ViewCatalog`
  returns its `View` directly instead of wrapping it in a `MetadataTable`. A view is never
  represented as a `Table`.

A combined catalog stays easy to implement: from `loadRelation`, return your own `Table` (or a
`DelegatingTable` over a `TableInfo`) for a table, and a `View` for a view.

`Relation` is left un-sealed: `Table` is independently an open interface (so a closed hierarchy
adds little), and a sealed Java interface trips Scala's pattern-match analysis.

Modeling `Relation = Table | View` removes the `ViewInfo IS-A TableInfo` lie and the
view-as-Table smuggling, keeps `DelegatingTable` single-purpose, and keeps each builder scoped to
exactly what its kind needs -- while preserving the single-RPC `loadRelation` perf path.

No. The affected APIs (`Relation`, `Table`, `View`, `TableInfo`, `DelegatingTable`,
`RelationCatalog`, `ViewCatalog`) are `Evolving` and unreleased (`since 4.2.0`).

Existing tests, updated for the new types. Ran `DataSourceV2MetadataViewSuite`,
`DataSourceV2MetadataTableSuite`, `MetricViewV2CatalogSuite`, the v2 view DDL suites,
`SimpleSQLViewSuite`, `PersistedViewTestSuite`, and the temp-view suites -- all passing.
`catalyst` and `sql/core` compile (main + test).

Generated-by: Claude Code (Opus 4.8)

Closes apache#56606 from cloud-fan/view-info-relation-info-refactor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 1c997dc)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…e History Server

### What changes were proposed in this pull request?

This PR makes the Spark Connect UI tab render in the Spark History Server (SHS). Two changes:

1. Register `SparkConnectServerHistoryServerPlugin` via the `AppHistoryServerPlugin` SPI (a new `META-INF/services/org.apache.spark.status.AppHistoryServerPlugin` resource in the `connect/server` module), so that SHS discovers it through `ServiceLoader`, as the SQL, Streaming and Hive Thrift Server plugins already do.
2. Make `SparkConnectServerListener` read its configuration from the `SparkConf` passed to its constructor instead of `SparkEnv.get.conf`.

### Why are the changes needed?

The Spark Connect UI page added in SPARK-44394 works on a live driver but never appears in the History Server:

- `SparkConnectServerHistoryServerPlugin` is implemented but is not registered in `META-INF/services`, so SHS never loads it via `ServiceLoader`. The live UI works only because `SparkConnectService` registers the tab and listener directly.
- Even once the plugin is registered, the listener's constructor reads configuration via `SparkEnv.get.conf`. There is no active `SparkEnv` during SHS replay, so `SparkEnv.get` returns `null` and the listener throws an NPE in `FsHistoryProvider.rebuildAppStore`, failing the whole application UI with HTTP 500:

```
java.lang.NullPointerException: Cannot invoke "org.apache.spark.SparkEnv.conf()" because the return value of "org.apache.spark.SparkEnv$.get()" is null
	at org.apache.spark.sql.connect.ui.SparkConnectServerListener.<init>(SparkConnectServerListener.scala)
	at org.apache.spark.sql.connect.ui.SparkConnectServerHistoryServerPlugin.createListeners(SparkConnectServerHistoryServerPlugin.scala)
	at org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala)
```

### Does this PR introduce _any_ user-facing change?

Yes. The Spark Connect tab (sessions / executions) now renders in the History Server when replaying event logs of a Spark Connect application. Previously the tab was never shown there. This is a change relative to released versions and master.

### How was this patch tested?

- Added a unit test in `SparkConnectServerListenerSuite` that constructs the listener with no active `SparkEnv` (simulating History Server replay). It reproduces the NPE before the fix and passes after.
- Updated the existing test helper to set the UI-retention configs on the `SparkConf` passed to the listener (previously set on `SparkEnv.get.conf`, which the listener no longer reads).
- Manually verified on a History Server that the Connect tab renders from Spark Connect event logs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56682 from j1wonpark/SPARK-57601.

Authored-by: Jiwon Park <jpark92@outlook.kr>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit dc1a5fe)
Signed-off-by: Cheng Pan <chengpan@apache.org>
…26.2-slim`

### What changes were proposed in this pull request?
This PR aims to upgrade `sarutak/oracle-free` docker image to `23.26.2-slim`.
This image is built from [a forked repository](https://github.com/sarutak/oci-oracle-free/tree/password-initialization-timeout) from [gvenzl/oci-oracle-free](https://github.com/gvenzl/oci-oracle-free), which includes fixes for the flakiness of `OracleIntegrationSuite` and `OracleJoinPushdownIntegrationSuite`.

* sarutak/oci-oracle-free@7d10cc6
* sarutak/oci-oracle-free@25427b3

### Why are the changes needed?
* Ensure the JDBC source works with the newer Oracle Database.
* Fix the flakiness of `OracleIntegrationSuite` and `OracleJoinPushdownIntegrationSuite`

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#56699 from sarutak/upgrade-oracle-free-23.26.2.

Authored-by: Kousuke Saruta <sarutak@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 01b964a)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ot mask the original error when registration fails

### What changes were proposed in this pull request?
`add_pipeline_analysis_context` sets `extension_id = None` up front and only assigns it after the extension is registered, but the `finally` block always calls `client.remove_user_context_extension(extension_id)`. If any step before that assignment fails, `extension_id` is still `None`, and `remove_user_context_extension(None)` runs `None.find(...)` and raises `AttributeError`, which hides the real failure.

This change runs the cleanup only when an extension was actually registered:

```python
finally:
    if extension_id is not None:
        client.remove_user_context_extension(extension_id)
```

### Why are the changes needed?
A failure while setting up the analysis context (for example the proto import or `add_threadlocal_user_context_extension`) currently surfaces as `AttributeError: 'NoneType' object has no attribute 'find'` instead of the error that actually caused it.

### Does this PR introduce _any_ user-facing change?
No. It only changes which exception is raised when setup fails: the original one rather than the `AttributeError` that was masking it.

### How was this patch tested?
Added a test that makes extension registration raise and checks that the original error propagates and that no extension is left registered. It fails without the fix and passes with it.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56652 from LuciferYang/sdp-analysis-context-finally-guard.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 6c2a126)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
…t from the default value of `maxPartitionBytes`

### What changes were proposed in this pull request?

This PR aims to make `FileSourceStrategySuite` be independent from the default value of `maxPartitionBytes`.

### Why are the changes needed?

The following `maxPartitionNum` test case fails like the following when the default value of `maxPartitionBytes` is changed. If a test case assume some other configurations, it should be set explicitly inside the test case.

```
[info] - SPARK-44021: Test spark.sql.files.maxPartitionNum works as expected *** FAILED *** (33 milliseconds)
[info]   300000 did not equal 150000 (FileSourceStrategySuite.scala:613)
```

Since the current value of `maxPartitionBytes` is `128MB`, this PR explicitly sets it.

https://github.com/apache/spark/blob/f2cbc7803c18648130db862cd85581a57c1dad98/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2721-L2727

### Does this PR introduce _any_ user-facing change?

No, this is a test case fix.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#56717 from dongjoon-hyun/SPARK-57647.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 0cc4d05)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ECK constraint

### What changes were proposed in this pull request?

This PR makes `predicateSql` a mandatory field of the DSv2 `Check` constraint (`org.apache.spark.sql.connector.catalog.constraints.Check`).

Previously, `Check.Builder.build()` only rejected the case where **both** `predicateSql` and `predicate` were `null`, which allowed a `Check` to be constructed with only a structured `predicate`. This PR tightens the validation so that `predicateSql` must always be provided. `predicate` remains optional and is the structured form used when the condition can be expressed with supported expressions.

Specifically:
- `Check.Builder.build()` now throws when `predicateSql` is `null`, regardless of `predicate`.
- `Check.definition()` is simplified to always render `predicateSql` (the previous fallback to `predicate` is dead code now that `predicateSql` is guaranteed to be present).
- Javadoc on the class, `predicateSql()`, and `predicate()` is updated to document that `predicateSql` is the canonical representation and is always present, while `predicate` is optional and may be `null`.

### Why are the changes needed?

`predicateSql` is the canonical representation of a CHECK condition. Spark always populates it from the original SQL text in `CheckConstraint.toV2Constraint`, while `predicate` is only set when the condition can be translated to a supported `Predicate` (it is `null` otherwise, e.g. for `from_json(j, 'a INT').a > 1`).

Several read paths already assume `predicateSql` is present. For example, `ResolveTableConstraints.buildCatalystExpression` prefers the structured `predicate` but falls back to parsing `predicateSql`:

```scala
Option(c.predicate())
  .flatMap(V2ExpressionUtils.toCatalyst)
  .getOrElse(catalogManager.v1SessionCatalog.parser.parseExpression(c.predicateSql()))
```

If a connector were to build a `Check` with a `null` `predicateSql` and a `predicate` that cannot be converted back by `V2ExpressionUtils.toCatalyst`, this would fall through to `parseExpression(null)` and fail with an NPE. `predicateSql` is also used as the human-readable condition in CHECK violation error messages. Requiring `predicateSql` makes the invariant explicit and keeps these paths safe.

### Does this PR introduce _any_ user-facing change?

No. `Check` is an `Evolving` DSv2 API, and Spark itself always sets `predicateSql`, so no existing Spark behavior changes. The only effect is tighter validation for connector authors who construct `Check` directly: building a `Check` without `predicateSql` now fails fast with a clear error instead of producing a constraint that downstream code already assumes is invalid.

### How was this patch tested?

Updated `ConstraintSuite`:
- The existing "CHECK constraint toDDL" `con2` case now also supplies `predicateSql` (it previously relied on the predicate-only path).
- Added "CHECK constraint requires predicateSql", asserting that `build()` fails with `INTERNAL_ERROR` when `predicateSql` is absent, both when no condition is supplied at all and when only a `predicate` is supplied.

```
build/sbt 'catalyst/testOnly *ConstraintSuite'
```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Closes apache#56711 from gengliangwang/spark-57642.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 1f185c0)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
…en latest partition offsets are unavailable

### What changes were proposed in this pull request?

Fix a `NullPointerException` in the Kafka micro-batch source when custom metrics are reported before the latest partition offsets are known.

In the non-real-time-mode branch of the instance `metrics()`, `latestPartitionOffsets` (a `var` initialized to null until `latestOffset()` runs) was wrapped as `Some(latestPartitionOffsets)`, producing `Some(null)`. The companion `metrics(...)` then evaluated `isDefined` as true, called `.get` (null), and invoked `.map` on it, throwing. The fix changes that single wrap to `Option(latestPartitionOffsets)`, so a null becomes `None` and the metrics computation is correctly skipped.

### Why are the changes needed?

`metrics()` can be called before the first batch completes (e.g. progress reporting), while `latestPartitionOffsets` is still null:

```
java.lang.NullPointerException: Cannot invoke "scala.collection.IterableOps.map(scala.Function1)"
  because the return value of "scala.Option.get()" is null
    at org.apache.spark.sql.kafka010.KafkaMicroBatchStream$.metrics(KafkaMicroBatchStream.scala:520)
```

This was introduced by apache#52729 (SPARK-54027, Kafka RTM support).

### Does this PR introduce _any_ user-facing change?

No behavioral change for well-formed inputs. It prevents the streaming query from crashing when metrics are requested before the latest partition offsets are available; in that case an empty metrics map is returned, as intended.

### How was this patch tested?

Added a test in `KafkaMicroBatchSourceSuite` that drives the instance `metrics()` while `latestPartitionOffsets` is still null (the stream is built but `latestOffset()` has not run), asserting an empty map and no NPE. Verified the test fails with the original NPE if the call-site fix is reverted to `Some(...)`.

### Was this patch authored or co-authored using generative AI tooling?

Yes.

Closes apache#56526 from yadavay-amzn/fix/SPARK-57438-kafka-metrics-npe.

Authored-by: Anupam Yadav <anupamya@amazon.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 1125957)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…nClusterSuite to fix ACCEPTED-state timeouts

### What changes were proposed in this pull request?

`BaseYarnClusterSuite` configures a mini `CapacityScheduler` but never sets `yarn.scheduler.capacity.maximum-am-resource-percent`, so it defaults to `0.1`. On memory-constrained CI runners the queue's total AM resource budget becomes ~1GB, which is smaller than the 1–2GB AM/driver memory these tests request. Applications then wedge in the `ACCEPTED` state (never activated) and the suite times out after 3 minutes with `handle.getState().isFinal() was false`.

This sets `maximum-am-resource-percent` to `1.0` (global + `root.default`) so AMs can use the whole test queue and applications are always activated.

### Why are the changes needed?

`YarnClusterSuite` fails 6 tests with a 3-minute `eventually` timeout on the scheduled Maven builds (`resource-managers#yarn` module):
- run Spark in yarn-client/cluster mode with different configurations, ensuring redaction
- yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630)
- SPARK-35672: additional jar using URI scheme 'local' (client, cluster, client + gateway-replacement)

The YARN diagnostics show `Queue's AM resource limit exceeded. AM Resource Request = <memory:2048>; Queue Resource Limit for AM = <memory:1024>` repeated >1000 times.

**Failing job (before):** https://github.com/apache/spark/actions/runs/28045133937/job/83029837948 — `Build / Maven (branch-4.2, Scala 2.13, JDK 21)`, `resource-managers#yarn` (6 failures).

**Passing job (with this fix):** https://github.com/HyukjinKwon/spark/actions/runs/28066027338/job/83090387029 — `resource-managers/yarn` tests: `YarnClusterSuite` 30/30 pass, the 6 formerly-failing tests now complete in ~11s each (was 180s timeout).

### Does this PR introduce any user-facing change?

No. Test-only.

### How was this patch tested?

Ran the `resource-managers/yarn` module tests on a fork (link above); `YarnClusterSuite` passes 30/30.

This pull request and its description were written by Isaac.

Closes apache#56715 from HyukjinKwon/ci-fix/yarn-cluster-am-resource-percent.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <hyukjin.kwon@databricks.com>
…lt value of `maxShuffledHashJoinLocalMapThreshold`

### What changes were proposed in this pull request?

This PR aims to make SQL tests be independent from the default value of `maxShuffledHashJoinLocalMapThreshold` by pinning on the current default value.

https://github.com/apache/spark/blob/9dab83b728705157ba4511c6d3b5b8244089a6f5/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L1302-L1311

### Why are the changes needed?

These suites assume AQE does not prefer a shuffled hash join, which only holds while this threshold
defaults to `0`. If the default is raised, AQE picks a shuffled hash join over a sort merge join and
breaks them (e.g. SMJ-spill checks, extended explain output, plan-tree node ids). A test that assumes a config value should set it explicitly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes apache#56747 from dongjoon-hyun/SPARK-57667.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d55264e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 3186e1e)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…CheckpointV2` as `ExtendedSQLTest`

### What changes were proposed in this pull request?

This PR aims to mark `StateDataSourceTransformWithStateSuiteCheckpointV2` as `ExtendedSQLTest`.

### Why are the changes needed?

To rebalance SQL jobs. Currently, `extended tests` is shorter. Since `StateDataSourceTransformWithStateSuiteCheckpointV2` is the most time-communing test suite which takes over 8 minutes (484s), we had better move it from `other tests` to `extended tests` by marking `ExtendedSQLTest`.

- https://github.com/apache/spark/actions/runs/28099615476

<img width="719" height="133" alt="Screenshot 2026-06-24 at 14 11 38" src="https://github.com/user-attachments/assets/22192cd6-c1fc-49e6-b79c-0543f931066b" />

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes apache#56753 from dongjoon-hyun/SPARK-57675.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit afe9339)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…WithRowChecksum` to `ExtendedSQLTest`

### What changes were proposed in this pull request?

This PR aims to move `StateDataSourceTransformWithStateSuiteWithRowChecksum` to `ExtendedSQLTest` from `SlowSQLTest` to rebalance the CI test time.

### Why are the changes needed?

`StateDataSourceTransformWithStateSuiteWithRowChecksum` took over 14 minutes in `slow tests` suite (Total: `121 min`). We had better move this to `ExtendedSQLTest` whose running time is shorter (Total: `82 min`)

- https://github.com/apache/spark/actions/runs/28135351057/job/83322926462

<img width="612" height="138" alt="Screenshot 2026-06-24 at 19 14 57" src="https://github.com/user-attachments/assets/6b998c20-280f-4015-9e09-366831da0eca" />

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#56758 from dongjoon-hyun/SPARK-57683.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit e6fcb84)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…E-binary`

### What changes were proposed in this pull request?

Fix the artifact name in `LICENSE-binary` from `log4j-slf4j-impl` to `log4j-slf4j2-impl` to match the bundled jar.

### Why are the changes needed?

Apache Spark 3.4.0+ distributions bundle `log4j-slf4j2-impl-*.jar`, but `LICENSE-binary` listed the old `log4j-slf4j-impl` name.

- apache#37844

https://github.com/apache/spark/blob/e6fcb8440915e6e0f4f4945b486bfb2c568f9562/pom.xml#L840-L841

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56761 from dongjoon-hyun/SPARK-57686.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 72d5e6b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
… keys

### What changes were proposed in this pull request?

`LevelDBIterator` and `RocksDBIterator` each have a static `compare(byte[], byte[])` used to order keys during iteration, and it compared bytes with `a[i] - b[i]`. That had two problems. The `diff += ...` accumulation was dead: the loop returns on the first non-zero byte, so it was really just `=`. More importantly, the signed byte subtraction disagrees with how LevelDB and RocksDB order keys, which is unsigned bytewise (their default comparators use `memcmp`). For any key byte `>= 0x80`, such as the UTF-8 of a non-ASCII string index value, the comparator's order diverged from the store's. Both methods now delegate to `Arrays.compareUnsigned`, the JDK method that performs exactly an unsigned lexicographic byte-array comparison, and they are marked `VisibleForTesting` to match the sibling `startsWith` helper.

### Why are the changes needed?

The comparator should agree with the ordering the underlying store uses. With signed bytes it did not, so for non-ASCII keys the iterator could compare a key against its bounds incorrectly. ASCII keys are unaffected, which is why this never surfaced in practice, but the comparator was still wrong. The method is called at four iterator sites that use only the sign and two type-info sites that use `!= 0`, all of which stay correct under the change.

### Does this PR introduce _any_ user-facing change?

No. It only corrects key ordering that was previously wrong, and only for keys containing bytes `>= 0x80`.

### How was this patch tested?

Added `DBIteratorCompareSuite`, which calls both static comparators directly (no database, so it runs on every platform) and checks equal arrays, prefix and length ordering, and the unsigned cases such as `0x80 > 0x7f` and `0xff > 0x00`. Reverting either method to `Arrays.compare` (the signed variant) fails the suite, so it is a real regression guard. The full kvstore test suite passes locally, with the LevelDB suites skipped on Apple Silicon and covered by CI, and checkstyle reports no violations.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#56740 from LuciferYang/SPARK-kvstore-iterator-compare.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit e19d4e6)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
### What changes were proposed in this pull request?

This PR aims to add `arrow-compression` to LICENSE-binary

### Why are the changes needed?

Apache Spark 4.1.0+ distributions bundle `arrow-compression-*.jar`, but LICENSE-binary missed this.

- apache#52747

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56762 from dongjoon-hyun/SPARK-57687.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6c2e0d8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

This PR aims to add `javax.servlet-api` to `LICENSE-binary`.

### Why are the changes needed?

Apache Spark 4.0.0+ distributions bundles `javax.servlet-api-*.jar` which is missed in `LICENSE-binary`.

- apache#45154

### Does this PR introduce _any_ user-facing change?

No behavior change.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56763 from dongjoon-hyun/SPARK-57682.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 994f049)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ernetes-httpclient-vertx` in `LICENSE-binary`

### What changes were proposed in this pull request?

This PR updates `LICENSE-binary` to replace `io.fabric8:kubernetes-httpclient-okhttp`
with `io.fabric8:kubernetes-httpclient-vertx`:

```
-io.fabric8:kubernetes-httpclient-okhttp
+io.fabric8:kubernetes-httpclient-vertx
```

### Why are the changes needed?

After the fabric8 `kubernetes-client` 7.x upgrade (SPARK-50493) switched the HTTP
backend from OkHttp to Vert.x, the distribution bundles `kubernetes-httpclient-vertx`
instead of `kubernetes-httpclient-okhttp`. This keeps `LICENSE-binary` consistent
with the actual bundled jars. Both are Apache License 2.0.

- apache#49159

https://github.com/apache/spark/blob/e6fcb8440915e6e0f4f4945b486bfb2c568f9562/dev/deps/spark-deps-hadoop-3-hive-2.3#L160

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56765 from dongjoon-hyun/SPARK-57689.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 5d2a111)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

`VariantUtil.getString` and `VariantUtil.getMetadataKey` decoded their bytes with `new String(bytes, offset, length)`, which uses the JVM default charset. This PR decodes both with `StandardCharsets.UTF_8`, matching how `VariantBuilder` writes keys and string values (`getBytes(StandardCharsets.UTF_8)`) and the UTF-8 string data defined by the variant binary format. A regression test, `VariantUtf8DecodeSuite`, is added in the `common/variant` module next to `VariantUtil`.

### Why are the changes needed?

The variant binary stores dictionary keys and string content as UTF-8, but the reader decoded them with the platform default charset, so the write and read sides disagree whenever that charset is not UTF-8. On Java 17 the default charset is environment-dependent (for example `US-ASCII` when the JVM starts under `LANG=C`), and reading a variant that contains non-ASCII keys or strings then silently corrupts those characters.

### Does this PR introduce _any_ user-facing change?

No change when the JVM default charset is UTF-8, which is the common case and what CI runs. Under a non-UTF-8 default charset, operations that read variant strings or keys -- `to_json(variant)`, casting a variant to string, `schema_of_variant`, and field/key extraction -- now return the correct characters instead of corrupting non-ASCII keys and string values.

### How was this patch tested?

Added `VariantUtf8DecodeSuite` in `common/variant`. A plain unit test cannot exercise the bug, because the default charset is fixed at JVM startup and pinned to `-Dfile.encoding=UTF-8` in CI, so the test forks a child JVM with `-Dfile.encoding=ISO-8859-1` and round-trips a variant with non-ASCII object keys and string values there, including a value over 63 UTF-8 bytes to cover the `LONG_STR` path. With the pre-fix default-charset decode the characters are corrupted (for example `café` becomes `café`) and the child exits non-zero; with the fix it passes. I confirmed both directions by temporarily reverting the fix. The test self-cancels if the child's default charset turns out to be UTF-8. Run with `build/sbt 'variant/testOnly *VariantUtf8DecodeSuite'`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56659 from LuciferYang/SPARK-variant-utf8-string-decode.

Authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 09205bb)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
…e `HiveClientSuites`

### What changes were proposed in this pull request?

This PR adds `SlowHiveTest` to the version-parameterized `HiveClientSuite` to run it correctly.

### Why are the changes needed?

A ScalaTest tag annotation applies only to a suite's own tests, not to its nested suites. The `HiveClientSuites` wrapper already carries `SlowHiveTest`, but it runs the per-version `HiveClientSuite` instances as nested suites, so those tests were left untagged. This PR tags the inner suite so they are classified as slow Hive tests too.

https://github.com/apache/spark/blob/526d88da65e0ba200fcca80285733731f4d3e2ba/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuites.scala#L37-L39

In other words, `HiveClientSuites` is currently running inside `hive - other tests` although it is marked as `SlowHiveTest`. After this PR, it will run inside `hive - slow tests` correctly.
- https://github.com/apache/spark/actions/runs/28171846893/job/83441883795

```
HiveClientSuites:
[info] HiveClientSuite(2.0):
[info] - 2.0: create client (431 milliseconds)
[info] - 2.0: createDatabase (2 seconds, 602 milliseconds)
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Closes apache#56780 from dongjoon-hyun/SPARK-57697.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d9fc0c1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request?

This PR reworks column ID validation for nested fields in DSv2.

### Why are the changes needed?

The original implementation detected dropped-and-re-added columns by comparing top-level Column.id() strings in a dedicated validateColumnIds pass, but this approach had no visibility into nested struct fields, array elements, or map keys/values. To work around this limitation, connectors had to encode nested field IDs into the top-level ID string (as demonstrated by ComposedColumnIdTableCatalog), placing an unreasonable burden on connector authors and making the feature fragile by design.

The new mechanism stores field IDs in `StructField` metadata and validates within `validateSchemaCompatibility`.

### Does this PR introduce _any_ user-facing change?

Yes but it targets unreleased functionality and must be cherry picked to 4.2.

### How was this patch tested?

Existing and new tests.

### Was this patch authored or co-authored using generative AI tooling?

Claude Code v2.1.183

Closes apache#56796 from aokolnychyi/spark-57544-4.2.

Authored-by: Anton Okolnychyi <aokolnychyi@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
…1746

# Conflicts:
#	.github/workflows/build_and_test.yml
#	.github/workflows/build_infra_images_cache.yml
#	R/pkg/DESCRIPTION
#	assembly/pom.xml
#	common/kvstore/pom.xml
#	common/network-common/pom.xml
#	common/network-shuffle/pom.xml
#	common/network-yarn/pom.xml
#	common/sketch/pom.xml
#	common/tags/pom.xml
#	common/unsafe/pom.xml
#	common/utils-java/pom.xml
#	common/utils/pom.xml
#	common/variant/pom.xml
#	connector/avro/pom.xml
#	connector/docker-integration-tests/pom.xml
#	connector/kafka-0-10-assembly/pom.xml
#	connector/kafka-0-10-sql/pom.xml
#	connector/kafka-0-10-token-provider/pom.xml
#	connector/kafka-0-10/pom.xml
#	connector/kinesis-asl-assembly/pom.xml
#	connector/kinesis-asl/pom.xml
#	connector/profiler/pom.xml
#	connector/protobuf/pom.xml
#	connector/spark-ganglia-lgpl/pom.xml
#	core/pom.xml
#	docs/_config.yml
#	examples/pom.xml
#	graphx/pom.xml
#	hadoop-cloud/pom.xml
#	launcher/pom.xml
#	mllib-local/pom.xml
#	mllib/pom.xml
#	pom.xml
#	python/pyspark/version.py
#	repl/pom.xml
#	resource-managers/kubernetes/core/pom.xml
#	resource-managers/kubernetes/integration-tests/pom.xml
#	resource-managers/yarn/pom.xml
#	sql/api/pom.xml
#	sql/catalyst/pom.xml
#	sql/connect/client/jdbc/pom.xml
#	sql/connect/client/jvm/pom.xml
#	sql/connect/common/pom.xml
#	sql/connect/server/pom.xml
#	sql/connect/shims/pom.xml
#	sql/core/pom.xml
#	sql/hive-thriftserver/pom.xml
#	sql/hive/pom.xml
#	sql/pipelines/pom.xml
#	streaming/pom.xml
#	tools/pom.xml
#	udf/worker/core/pom.xml
#	udf/worker/proto/pom.xml
@giggsoff giggsoff force-pushed the hotfix/NGSOK-1746 branch from fad8070 to 654942e Compare June 26, 2026 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.