[SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession by fwc · Pull Request #56190 · apache/spark

fwc · 2026-05-28T20:17:32Z

What changes were proposed in this pull request?

Introduces sql.SparkSessionBinder and classic.SparkSessionBinder as 'implementor' counterparts to the 'declarators' sql.SparkSessionProvider and classic.SparkSessionProvider.
Deprecates SharedSparkSession with the hint that SparkSessionBinder and QueryTest shall be used instead.

Why are the changes needed?

Currently, most tests use SharedSparkSession to obtain the spark object. This prevents specializing these tests in sql/connect as SharedSparkSession provides a classic.SparkSession, thus preventing overriding.

This PR deprecates SharedSparkSession and instead introduces sql.SparkSessionBinder and classic.SparkSessionBinder. While both create a classic.SparkSession, the sql.SparkSessionBinderhas an abstract def spark: sql.SparkSession declaration, so it can we overriden with some trait that provides a connect.SparkSession.

If some FooSuite now uses the sql.SparkSessionBinder trait like e.g.

class FooSuite extends SparkSessionBinder with QueryTest {
  checkAnswer(
    sql("SELECT 1"),
    Seq(1)
  )
}

We can now add a connect variant of that suite as follows:

class FooWithConnectSuite extends FooSuite
  with connect.SparkSessionBinder
  with connect.QueryTest

Does this PR introduce any user-facing change?

This PR extends the beforeAll/afterAll of SharedSparkSessionBase to include the the thread audit check, which was previously only present in SharedSparkSession.
AFAICS, SparkSessionBase is neither used in delta lake nor in apache iceberg.

How was this patch tested?

This patch is test-only.

Was this patch authored or co-authored using generative AI tooling?

Parts of this patch were authored by claude code

…parkSession This is technically an 'api change' as it moves the thread audit stuff from `test.SharedSparkSession` to `test.SharedSparkSessionBase`. This breaks code that implements `SharedSparkSessionBase` to circumvent the thread audit stuff.

…adowing

cloud-fan

1 blocking, 2 non-blocking, 3 nits.
Right direction — decoupling the session type so suites can run on classic or Connect. My main feedback is on the author-facing shape: I'd push for a binder-free base + per-env concrete suites, with the bare SparkSessionBinder kept internal.

Design / architecture (1)

sql/core/.../sql/QueryTest.scala:1214: push the binder-free-base + classic/connect-concrete pattern; treat bare SparkSessionBinder as internal — see inline

Suggestions (2)

sql/connect/.../connect/SparkSessionBinder.scala:89: redundant afterEach override with an inaccurate comment — see inline
sql/connect/.../connect/QueryTest.scala:30: only one checkAnswer overload overridden — see inline

Nits: 3 minor items (see inline comments).

cloud-fan · 2026-05-29T04:04:43Z

 }

-class QueryTestSuite extends test.SharedSparkSession {
+class QueryTestSuite extends QueryTest with SparkSessionBinder {


This migration — mixing the bare sql.SparkSessionBinder into a concrete suite — is the shape I'd push back on. sql.SparkSessionBinder binds a classic session but exposes spark only as the abstract sql.SparkSession, so it's really internal plumbing, not what a test author should reach for.

The end-state I'd recommend documenting and demonstrating is a binder-free base + per-env concrete suites:

abstract class FooSuiteBase extends QueryTest { // no binder; spark abstract test("shared") { checkAnswer(sql("SELECT 1"), Row(1)) } } class FooSuite extends FooSuiteBase with classic.SparkSessionBinder { test("classic only") { ... } } class FooConnectSuite extends FooSuiteBase with connect.SparkSessionBinder with connect.QueryTest { test("connect only") { ... } }

QueryTest already mixes in SparkSessionProvider (via SQLTestData) and leaves spark abstract, so it works as the env-agnostic base directly. Concretely: (1) steer the migration and the @deprecated message at classic.SparkSessionBinder / connect.SparkSessionBinder + this base pattern, not the bare binder; (2) QueryTestWithConnectSuite currently demonstrates the retrofit path (extending an already-classic-bound QueryTestSuite and overriding the binding) — a binder-free base would demonstrate the cleaner pattern and double as the template authors copy.

I want to nudge test authors towards writing (somewhat) connect-compatible tests by default, which is why I want them to write tests with a sql.SparkSession in hand.

My fear is that the 'clean' way is not the 'easiest' way. Most current tests do not use an abstract base class and I fear that most test authors will default to just start a new suite with classic.SparkSessionBinder as they might not think about connect in that moment:

// hypothetical antipattern, but path of least resistance: class FooSuite extends QueryTest with classic.SparkSessionBinder { test("all tests, both shared and classic only") { ... } }

I reworked the PR so that SparkSessionBinder now implements QueryTest. Now classic.SparkSessionBinder is a drop-in replacement for SharedSparkSession and sql.SparkSessionBinder provides the new, 'fixed' default.

What do you think of this approach?

cloud-fan · 2026-05-29T04:04:43Z

+    }
+  }
+
+  // The base SharedSparkSessionBase.afterEach calls spark.sharedState which is not supported


This comment is inaccurate after the refactor: connect.SparkSessionBinder extends sql.SparkSessionBinder directly (not SharedSparkSessionBase), and that parent's afterEach clears the cache via the private _spark — the classic session, which is exactly what's used on Connect (createSparkSession isn't overridden). So the parent's afterEach already works here and this override is redundant. If you do keep it, note that skipping super.afterEach() drops the BeforeAndAfterEach chain. Simplest fix is to remove the override entirely.

cloud-fan · 2026-05-29T04:04:43Z

+ */
+trait QueryTest extends sqlApi.QueryTest with SparkSessionProvider {
+
+  override protected def checkAnswer(


This overrides only the checkAnswer(df, Seq[Row]) variant, which is enough for QueryTestSuite. But the stated goal is re-running arbitrary sql/core suites over Connect, and the other QueryTest helpers (other checkAnswer overloads, checkDataset, ...) still reach classic-only paths like queryExecution/logicalPlan. Worth a line in the trait doc noting that broader reuse will need more overrides.

cloud-fan · 2026-05-29T04:04:43Z

+/**
+ * Runs [[QueryTestSuite]] tests through a Connect session.
+ *
+ * This validates the `FooSuite with connect.SharedSparkSession` pattern: the existing


There's no connect.SharedSparkSession trait; the pattern this suite actually uses (and that the sibling connect/SparkSessionBinder.scala doc shows) is connect.SparkSessionBinder with connect.QueryTest.

Suggested change

* This validates the `FooSuite with connect.SharedSparkSession` pattern: the existing

* This validates the `FooSuite with connect.SparkSessionBinder with connect.QueryTest` pattern: the existing

cloud-fan · 2026-05-29T04:04:44Z

+  }
+
+  /**
+   * Suites extending [[SharedSparkSession]] are sharing resources (e.g. SparkSession) in their


This doc moved out of SharedSparkSession; the snapshot-before-init logic now lives in this trait, so referring to SharedSparkSession is stale.

Suggested change

* Suites extending [[SharedSparkSession]] are sharing resources (e.g. SparkSession) in their

* Suites extending this trait are sharing resources (e.g. SparkSession) in their

cloud-fan · 2026-05-29T04:04:44Z

+import org.apache.spark.sql
+import org.apache.spark.sql.{classic, QueryTest, QueryTestBase}

+@deprecated("Use SparkSessionBinder and QueryTest instead")


@deprecated takes a since version as its second argument; adding it documents when the deprecation started and matches the convention elsewhere in the codebase. Same on line 59.

…nits

fwc · 2026-05-29T18:11:18Z

Hi @cloud-fan, I changed the PR so that SharedSparkSession is now an empty alias of classic.SparkSessionBinder with a deprecation note that recommends using sql.SparkSessionBinder if possible.

I am unsure with regards to the SparkSessionBinder name:

AFAICS SharedSparkSession is/was the testing trait (~500 extends/implements usages compared to ~150 usages of QueryTest, both according to Intellij's "Find Usages").
If it wouldn't be a breaking change, I'd want to rename QueryTest to QueryTestHelpers and SparkSessionBinder to QueryTest. What do you think? Maybe QuerySuite? Maybe SparkSessionSuite?

Matthis Gördel added 8 commits May 28, 2026 19:41

Add classic.SparkSessionProvider

5e059ba

Move test.SharedSparkSessionBase functionality to sql.SharedSparkSession

be85c95

Rename sql.SharedSparkSession to sql.SparkSessionBinder to prevent sh…

c361709

…adowing

Deprecate test.SharedSparkSession

032a684

Add connect.SparkSession{Provider,Binder}, connect.QueryTest and demo

c9ea754

Add classic.SparkSessionBinder with usage demonstration

c9c8538

fixup: fix compile error

4c35b22

fwc force-pushed the sharedsparksession-refactor-mostly-nonbreaking branch from b7ba3f5 to 4c35b22 Compare May 28, 2026 20:54

cloud-fan reviewed May 29, 2026

View reviewed changes

Matthis Gördel added 3 commits May 29, 2026 17:32

Restructure so that SparkSessionBinder implements QueryTest, address …

c52b50f

…nits

fixup

642166b

Have SharedSparkSession as empty alias of classic.SparkSessionBinder

d51a967

fixup

6b5dba7

fwc requested a review from cloud-fan May 29, 2026 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession#56190

[SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession#56190
fwc wants to merge 12 commits into
apache:masterfrom
fwc:sharedsparksession-refactor-mostly-nonbreaking

fwc commented May 28, 2026 •

edited

Loading

Uh oh!

cloud-fan left a comment

Uh oh!

cloud-fan May 29, 2026

Uh oh!

fwc May 29, 2026

Uh oh!

cloud-fan May 29, 2026

Uh oh!

cloud-fan May 29, 2026

Uh oh!

cloud-fan May 29, 2026

Uh oh!

cloud-fan May 29, 2026

Uh oh!

cloud-fan May 29, 2026

Uh oh!

fwc commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	* This validates the `FooSuite with connect.SharedSparkSession` pattern: the existing
	* This validates the `FooSuite with connect.SparkSessionBinder with connect.QueryTest` pattern: the existing

	* Suites extending [[SharedSparkSession]] are sharing resources (e.g. SparkSession) in their
	* Suites extending this trait are sharing resources (e.g. SparkSession) in their

Conversation

fwc commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Design / architecture (1)

Suggestions (2)

Uh oh!

cloud-fan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

fwc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 29, 2026

Choose a reason for hiding this comment

Uh oh!

fwc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fwc commented May 28, 2026 •

edited

Loading

fwc commented May 29, 2026 •

edited

Loading