[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path) by haoyangeng-db · Pull Request #56211 · apache/spark

haoyangeng-db · 2026-05-29T20:02:09Z

What changes were proposed in this pull request?

SPARK-52812 (#56064) made Spark Connect Catalog.createTable eager by re-routing the two-argument createTable(tableName, path) overload through createTable(tableName, path, "parquet"). That hardcodes the parquet provider and drops the spark.sql.sources.default fallback that the overload previously relied on.

This PR restores the original behavior: the two-argument overload again leaves the source unset so the server resolves spark.sql.sources.default, while keeping the eager execution introduced by SPARK-52812. A regression test is added to CatalogSuite.

Why are the changes needed?

The two-argument createTable(tableName, path) overload is documented as "It will use the default data source configured by spark.sql.sources.default." After SPARK-52812 it always used parquet regardless of that configuration, contradicting its own contract and the classic Catalog behavior.

Does this PR introduce any user-facing change?

Yes, within the unreleased master branch. spark.catalog.createTable(tableName, path) on Spark Connect once again honors spark.sql.sources.default instead of always creating a parquet table. The eager-execution behavior from SPARK-52812 is preserved.

How was this patch tested?

Added a regression test in CatalogSuite that sets spark.sql.sources.default to json, writes JSON data, creates the table via the two-argument overload, and asserts the resulting table uses the json provider and is readable. The test fails on the previous hardcoded-parquet behavior.

Was this patch authored or co-authored using generative AI tooling?

Co-authored with Claude Code.

…reateTable(tableName, path) ### What changes were proposed in this pull request? SPARK-52812 (apache#56064) made Spark Connect `Catalog.createTable` eager by re-routing the two-argument `createTable(tableName, path)` overload through `createTable(tableName, path, "parquet")`. That hardcodes the parquet provider and drops the `spark.sql.sources.default` fallback that the overload previously relied on. This PR restores the original behavior: the two-argument overload again leaves the source unset so the server resolves `spark.sql.sources.default`, while keeping the eager execution introduced by SPARK-52812. A regression test is added to `CatalogSuite`. ### Why are the changes needed? The two-argument `createTable(tableName, path)` overload is documented as "It will use the default data source configured by spark.sql.sources.default." After SPARK-52812 it always used parquet regardless of that configuration, contradicting its own contract and the classic Catalog behavior. ### Does this PR introduce _any_ user-facing change? Yes, within the unreleased master branch. `spark.catalog.createTable(tableName, path)` on Spark Connect once again honors `spark.sql.sources.default` instead of always creating a parquet table. The eager-execution behavior from SPARK-52812 is preserved. ### How was this patch tested? Added a regression test in `CatalogSuite` that sets `spark.sql.sources.default` to `json`, writes JSON data, creates the table via the two-argument overload, and asserts the resulting table uses the json provider and is readable. The test fails on the previous hardcoded-parquet behavior. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8)

cloud-fan

0 blocking, 1 non-blocking, 0 nits.
Clean, correct, well-tested follow-up restoring the spark.sql.sources.default fallback while keeping eager execution.

Design / architecture (1)

Catalog.scala:398: 2-arg overload inlines the proto build, bypassing the delegation chain all other overloads use and duplicating the 5-arg execute{}+table() block. Correct and the setSource omission is necessary, but a private source: Option[String] impl shared by both would remove the duplication. Non-blocking. — see inline

cloud-fan · 2026-05-30T01:38:50Z

+    // Leave the source unset so the server resolves spark.sql.sources.default, as documented
+    // above. Routing through createTable(tableName, path, "parquet") would hardcode the provider
+    // and ignore that configuration.
+    sparkSession.execute { builder =>


The two-argument overload inlines the CreateTable proto build here, bypassing the delegation chain every other createTable overload uses (each delegates down to the single 5-arg general overload) and duplicating that method's execute{...} + table() block, minus setSource.

The behavior is correct and omitting setSource is necessary — source is optional in the proto, so the server only resolves spark.sql.sources.default when hasSource() is false, and delegating to the 5-arg overload would call setSource("") and pin the source to a literal empty string. So this is non-blocking.

That said, the duplication can be removed cleanly by extracting a private impl taking source: Option[String] that both the 2-arg (None) and 5-arg (Some(source)) overloads route through, calling setSource only when defined. That keeps a single place building the proto while preserving the unset-source semantics. Your call.

cloud-fan reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path)#56211

[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path)#56211
haoyangeng-db wants to merge 1 commit into
apache:masterfrom
haoyangeng-db:spark-52812-followup-createtable-default-source

haoyangeng-db commented May 29, 2026

Uh oh!

cloud-fan left a comment

Uh oh!

cloud-fan May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haoyangeng-db commented May 29, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Design / architecture (1)

Uh oh!

cloud-fan May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants