-
Notifications
You must be signed in to change notification settings - Fork 151
Open
Description
Summary
Several DataFrame methods from upstream DataFusion v53 are not yet exposed in datafusion-python. This issue covers set operations and query-related methods.
Missing Methods
Set operations:
-
distinct_on— deduplicate rows based on specific columns, keeping the first row per group -
except_distinct— set difference with deduplication (complement to existingexcept_all) -
intersect_distinct— set intersection with deduplication (complement to existingintersect) -
union_by_name— union two DataFrames matching columns by name rather than position -
union_by_name_distinct— union by name with deduplication
Query/display:
-
explain_with_options— explain plan with configurable detail options -
show_limit— display results with a custom row limit -
sort_by— sort by column names (simpler API thansortwhich requiresExpr) -
with_param_values— bind parameter values for prepared statements
Upstream Reference
Implementation
- Rust bindings:
crates/core/src/dataframe.rs - Python wrappers:
python/datafusion/dataframe.py
Note: This gap analysis was performed using an AI agent comparing upstream DataFusion v53 documentation against the current datafusion-python codebase.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels