feat: Add SQL planner, physical planner, and TableProvider hook for MERGE INTO#22988
Open
wirybeaver wants to merge 2 commits into
Open
feat: Add SQL planner, physical planner, and TableProvider hook for MERGE INTO#22988wirybeaver wants to merge 2 commits into
wirybeaver wants to merge 2 commits into
Conversation
Add merge_into async method to TableProvider trait for MERGE INTO DML support. The method accepts: - source: ExecutionPlan representing the USING clause - on: Expr representing the ON join condition - clauses: Vec<MergeIntoClause> for WHEN MATCHED/NOT MATCHED actions Default implementation returns not_impl_err for tables that don't support MERGE INTO operations.
Implement merge_to_plan and merge_clause_to_plan in SQL planner: - Parse Statement::Merge into LogicalPlan::Dml with WriteOp::MergeInto - Resolve target table and plan source (USING clause) as LogicalPlan - Build combined schema for target + source to resolve ON and WHEN expressions - Convert ON condition and WHEN clauses to DataFusion Expr - Handle UPDATE, INSERT, and DELETE actions in WHEN clauses Add physical planner dispatch for WriteOp::MergeInto: - Use source_as_provider() to recover the TableProvider from the TableSource - Extract source ExecutionPlan from children - Call TableProvider::merge_into with source plan, ON condition, and clauses - Wrap errors with MERGE INTO operation context Wire MergeInto's expressions through LogicalPlan tree-traversal so optimizers can rewrite them: add MergeIntoOp::exprs() (stable iteration order: on, then per-clause predicate + action value Exprs) and MergeIntoOp::with_new_exprs() to rebuild the op from a transformed expr vector. Branch LogicalPlan::apply_expressions, map_expressions, and with_new_exprs on WriteOp::MergeInto to use these helpers; other WriteOp variants continue to expose no expressions as before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Follow-up to #20763 (merged) which added
MergeIntoOp,MergeIntoClause, and proto types.Rationale for this change
MERGE INTO(SQL:2003) is a widely-used DML statement for upsert/conditional update workloads. This PR wires the types introduced in #20763 through the SQL planner, physical planner, andTableProvidertrait so that table implementations can actually execute merge operations.What changes are included in this PR?
datafusion/catalog—TableProvidertrait extensionmerge_into(source, on, clauses)async method with a defaultnot_impl_errimpl so existing providers are unaffected.datafusion/sql— SQL → LogicalPlanstatement.rs: parseStatement::MergeintoLogicalPlan::DmlwithWriteOp::MergeInto.USINGsource into aLogicalPlan.ONandWHENexpressions.ONcondition andWHEN MATCHED / NOT MATCHEDclauses to DataFusionExpr.datafusion/expr— expression plumbingMergeIntoOp::exprs(): stable iteration over all expressions (ON, then per-clause predicate + action values).MergeIntoOp::with_new_exprs(): rebuild op from a transformed expr vector.LogicalPlan::apply_expressions,map_expressions, andwith_new_exprsonWriteOp::MergeIntoso optimizers can rewrite merge expressions. OtherWriteOpvariants are unchanged.datafusion/core— physical planner dispatchWriteOp::MergeIntoin the physical planner.TableProviderviasource_as_provider(), extract the sourceExecutionPlan, and callTableProvider::merge_into.Are these changes tested?
datafusion/proto/tests/cases/roundtrip_logical_plan.rs(proto round-trip forMergeInto).MergeIntoOp::exprs/with_new_exprsare included indml.rs.TableProviderthat implementsmerge_into; that is left to follow-up once a concrete provider (e.g. Delta Lake) adopts the hook.Are there any user-facing changes?
TableProvidergains a newmerge_intomethod. The default implementation returnsnot_impl_err, so existing implementations compile without changes.MERGE INTO <table> USING <source> ON <cond> WHEN ...SQL syntax is now accepted by the DataFusion SQL parser and planner.