Skip to content

feat: Add SQL planner, physical planner, and TableProvider hook for MERGE INTO#22988

Open
wirybeaver wants to merge 2 commits into
apache:mainfrom
wirybeaver:feature/mergeinto
Open

feat: Add SQL planner, physical planner, and TableProvider hook for MERGE INTO#22988
wirybeaver wants to merge 2 commits into
apache:mainfrom
wirybeaver:feature/mergeinto

Conversation

@wirybeaver

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Follow-up to #20763 (merged) which added MergeIntoOp, MergeIntoClause, and proto types.

Rationale for this change

MERGE INTO (SQL:2003) is a widely-used DML statement for upsert/conditional update workloads. This PR wires the types introduced in #20763 through the SQL planner, physical planner, and TableProvider trait so that table implementations can actually execute merge operations.

What changes are included in this PR?

datafusion/catalogTableProvider trait extension

  • Add merge_into(source, on, clauses) async method with a default not_impl_err impl so existing providers are unaffected.

datafusion/sql — SQL → LogicalPlan

  • statement.rs: parse Statement::Merge into LogicalPlan::Dml with WriteOp::MergeInto.
  • Resolve the target table and plan the USING source into a LogicalPlan.
  • Build a combined target+source schema to resolve ON and WHEN expressions.
  • Convert ON condition and WHEN MATCHED / NOT MATCHED clauses to DataFusion Expr.

datafusion/expr — expression plumbing

  • MergeIntoOp::exprs(): stable iteration over all expressions (ON, then per-clause predicate + action values).
  • MergeIntoOp::with_new_exprs(): rebuild op from a transformed expr vector.
  • Branch LogicalPlan::apply_expressions, map_expressions, and with_new_exprs on WriteOp::MergeInto so optimizers can rewrite merge expressions. Other WriteOp variants are unchanged.

datafusion/core — physical planner dispatch

  • Dispatch WriteOp::MergeInto in the physical planner.
  • Recover the TableProvider via source_as_provider(), extract the source ExecutionPlan, and call TableProvider::merge_into.

Are these changes tested?

  • The SQL planner path is exercised by datafusion/proto/tests/cases/roundtrip_logical_plan.rs (proto round-trip for MergeInto).
  • Unit tests for MergeIntoOp::exprs / with_new_exprs are included in dml.rs.
  • End-to-end integration tests require a TableProvider that implements merge_into; that is left to follow-up once a concrete provider (e.g. Delta Lake) adopts the hook.

Are there any user-facing changes?

  • TableProvider gains a new merge_into method. The default implementation returns not_impl_err, so existing implementations compile without changes.
  • MERGE INTO <table> USING <source> ON <cond> WHEN ... SQL syntax is now accepted by the DataFusion SQL parser and planner.

Add merge_into async method to TableProvider trait for MERGE INTO
DML support. The method accepts:
- source: ExecutionPlan representing the USING clause
- on: Expr representing the ON join condition
- clauses: Vec<MergeIntoClause> for WHEN MATCHED/NOT MATCHED actions

Default implementation returns not_impl_err for tables that don't
support MERGE INTO operations.
Implement merge_to_plan and merge_clause_to_plan in SQL planner:
- Parse Statement::Merge into LogicalPlan::Dml with WriteOp::MergeInto
- Resolve target table and plan source (USING clause) as LogicalPlan
- Build combined schema for target + source to resolve ON and WHEN expressions
- Convert ON condition and WHEN clauses to DataFusion Expr
- Handle UPDATE, INSERT, and DELETE actions in WHEN clauses

Add physical planner dispatch for WriteOp::MergeInto:
- Use source_as_provider() to recover the TableProvider from the TableSource
- Extract source ExecutionPlan from children
- Call TableProvider::merge_into with source plan, ON condition, and clauses
- Wrap errors with MERGE INTO operation context

Wire MergeInto's expressions through LogicalPlan tree-traversal so
optimizers can rewrite them: add MergeIntoOp::exprs() (stable iteration
order: on, then per-clause predicate + action value Exprs) and
MergeIntoOp::with_new_exprs() to rebuild the op from a transformed
expr vector. Branch LogicalPlan::apply_expressions, map_expressions,
and with_new_exprs on WriteOp::MergeInto to use these helpers; other
WriteOp variants continue to expose no expressions as before.
@github-actions github-actions Bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate catalog Related to the catalog crate labels Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant