Skip to content

feat: Add Spark SQL parser dialect config#22529

Open
kumarUjjawal wants to merge 3 commits into
apache:mainfrom
kumarUjjawal:fix/add-spark-dialect
Open

feat: Add Spark SQL parser dialect config#22529
kumarUjjawal wants to merge 3 commits into
apache:mainfrom
kumarUjjawal:fix/add-spark-dialect

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This lets users combine Spark-compatible functions with Spark SQL parsing when they choose the dialect.

What changes are included in this PR?

  • Add Spark to the SQL parser dialect config enum.
  • Accept spark and sparksql as dialect config values.
  • Use Spark dialect for Spark sqllogictests.
  • Keep with_spark_features() config-neutral.
  • Update dialect docs and expected config output.
  • Add tests for Spark dialect parsing and Spark SQL execution with Spark functions.

Are these changes tested?

Yes

Are there any user-facing changes?

Users can now set datafusion.sql_parser.dialect to spark or sparksql.

@github-actions github-actions Bot added documentation Improvements or additions to documentation core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate spark labels May 26, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumarUjjawal
Looks good overall. I have one small suggestion to strengthen test coverage.

Comment thread datafusion/spark/src/session_state.rs Outdated

// Spark function + Spark dialect parsing
let result = ctx
.sql("SELECT sha2('abc', 256)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition. This test shows that Spark functions are registered, but SELECT sha2('abc', 256) also parses under the generic dialect, so it does not quite exercise the new Spark-dialect path.

Could we make the query use Spark-specific (or Spark-sensitive) syntax, or add a second assertion that would fail under Dialect::Generic? That would help ensure the test covers the full invariant: Spark functions are registered and Spark SQL parsing is active in the same session.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you @kosiew I have updated the test.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v53.1.0 (current)
       Built [  99.495s] (current)
     Parsing datafusion v53.1.0 (current)
      Parsed [   0.037s] (current)
    Building datafusion v53.1.0 (baseline)
       Built [  98.481s] (baseline)
     Parsing datafusion v53.1.0 (baseline)
      Parsed [   0.038s] (baseline)
    Checking datafusion v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.888s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 201.668s] datafusion
    Building datafusion-cli v53.1.0 (current)
       Built [ 170.355s] (current)
     Parsing datafusion-cli v53.1.0 (current)
      Parsed [   0.036s] (current)
    Building datafusion-cli v53.1.0 (baseline)
       Built [ 169.989s] (baseline)
     Parsing datafusion-cli v53.1.0 (baseline)
      Parsed [   0.037s] (baseline)
    Checking datafusion-cli v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.166s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 345.579s] datafusion-cli
    Building datafusion-common v53.1.0 (current)
       Built [  32.598s] (current)
     Parsing datafusion-common v53.1.0 (current)
      Parsed [   0.062s] (current)
    Building datafusion-common v53.1.0 (baseline)
       Built [  32.771s] (baseline)
     Parsing datafusion-common v53.1.0 (baseline)
      Parsed [   0.062s] (baseline)
    Checking datafusion-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.988s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_variant_added.ron

Failed in:
  variant Dialect:Spark in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:345

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  68.033s] datafusion-common
    Building datafusion-spark v53.1.0 (current)
       Built [  55.156s] (current)
     Parsing datafusion-spark v53.1.0 (current)
      Parsed [   0.067s] (current)
    Building datafusion-spark v53.1.0 (baseline)
       Built [  55.018s] (baseline)
     Parsing datafusion-spark v53.1.0 (baseline)
      Parsed [   0.064s] (baseline)
    Checking datafusion-spark v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.458s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 112.724s] datafusion-spark
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 170.091s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.024s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 170.408s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.025s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.105s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 343.963s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Spark SQL dialect

2 participants