Add SQL and physical planner support for MERGE INTO#2
Closed
wirybeaver wants to merge 39 commits into
Closed
Conversation
This was referenced Jun 11, 2026
d39efd8 to
c537e00
Compare
4e175a2 to
8d7fcff
Compare
…orr (apache#22764) ## Which issue does this PR close? - Closes apache#22763 . ## Rationale for this change Bounded sliding window queries using `covar_pop`, `covar_samp`, and `corr` currently fail with a `retract_batch is not implemented` error, preventing these aggregates from being used with sliding window frames. ## What changes are included in this PR? * Included `supports_retract_batch()` for the covariance and correlation accumulators. * Added SQL logic tests covering bounded sliding window execution for covariance and correlation aggregates. ## Are these changes tested? Yes. Added SQL logic tests covering: * Single-row bounded sliding frames * Multi-row bounded sliding frames for `covar_pop`, `covar_samp`, and `corr`. ## Are there any user-facing changes? Yes. `covar_pop`, `covar_samp`, and `corr` can now be used with bounded sliding window frames that previously failed. Also, no changes were made to any public APIs.
…22610) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#22528 ## Rationale for this change `date_bin` had a few edge cases that could return the wrong result, return an error only on array inputs, or panic/wrap when scaling timestamp and time values to nanoseconds. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Fix negative sub-second timestamp conversion before the epoch. - Make scalar and array paths return `NULL` consistently for per-row binning errors. - Use checked scaling when converting timestamp and time values to nanoseconds. - Return an error for invalid shared origin values that overflow during scaling. - Simplify duplicated stride and scale handling. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No public API changes. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? - Part of apache#15914 ## Rationale for this change DataFusion core's `concat_ws` does not support array arguments. Spark's `concat_ws(sep, ...)` accepts both scalar strings and arrays, expanding array elements and skipping nulls. This is needed for Spark compatibility in the `datafusion-spark` crate. ## What changes are included in this PR? - New `SparkConcatWs` UDF in `datafusion/spark/src/function/string/concat_ws.rs` - Supports `concat_ws(sep, str1, str2, ...)` with scalar strings - Supports array arguments: `concat_ws(',', array('a', 'b'), 'c')` → `"a,b,c"` - Null scalars and null array elements are skipped (Spark behavior) - Null separator returns NULL - Zero value arguments (`concat_ws(',')`) returns empty string - Supports Utf8, LargeUtf8, Utf8View, List, and LargeList types - Registered the function in `mod.rs` (`make_udf_function!`, `export_functions!`, `functions()`) - Replaced commented-out SLT tests with 14 working test cases covering basic usage, arrays, mixed arguments, nulls, column expressions, and edge cases ## Are these changes tested? Yes. - 7 unit tests in `concat_ws.rs` (basic, null values skipped, null separator, list arrays, list with nulls, mixed scalar+list, multiple rows) - 14 SLT tests in `spark/string/concat_ws.slt` covering scalars, arrays, nulls, column expressions, and edge cases ## Are there any user-facing changes? No. This is a new function in the `datafusion-spark` crate only.
## Which issue does this PR close? - Closes apache#22717. ## Rationale for this change Spark does not trim `dayOfWeek` before matching it in `next_day`, but `datafusion-spark` currently does. That makes values like `' MO '` succeed in DataFusion even though Spark treats them as invalid. ## What changes are included in this PR? - remove the `.trim()` call from `spark_next_day` - add a regression test proving whitespace-padded day names are rejected ## Are these changes tested? - `cargo test -p datafusion-spark next_day_rejects_whitespace_padded_day_names -- --nocapture` - `cargo test -p datafusion-spark` - `cargo fmt --all --check` - `cargo clippy -p datafusion-spark --all-targets --all-features --no-deps -- -D warnings` Note: the broader package clippy invocation still reports an existing unused import warning in untouched `datafusion/core/src/execution/session_state.rs` on current main. ## Are there any user-facing changes? Behavior now matches Spark for whitespace-padded `dayOfWeek` inputs in `next_day`. --------- Signed-off-by: xfocus3 <xfocus3@users.noreply.github.com> Co-authored-by: xfocus3 <xfocus3@users.noreply.github.com> Co-authored-by: Ahmed El amraouiyine <ahmed.elamraouiyine@vilavi.fr>
…pache#22937) ## Which issue does this PR close? - Closes #. ## Rationale for this change `cargo audit` currently reports the following vulnerabilities: ``` Crate: postgres-protocol Version: 0.6.11 Title: Unbounded SCRAM iteration count allows a malicious server to cause CPU-exhaustion denial of service Date: 2026-06-12 ID: RUSTSEC-2026-0179 URL: https://rustsec.org/advisories/RUSTSEC-2026-0179 Severity: 8.7 (high) Solution: Upgrade to >=0.6.12 Crate: postgres-protocol Version: 0.6.11 Title: Panic decoding a malformed `hstore` value allows denial of service Date: 2026-06-12 ID: RUSTSEC-2026-0180 URL: https://rustsec.org/advisories/RUSTSEC-2026-0180 Severity: 6.9 (medium) Solution: Upgrade to >=0.6.12 Crate: tokio-postgres Version: 0.7.17 Title: Panic on a `DataRow` with fewer fields than columns allows denial of service Date: 2026-06-12 ID: RUSTSEC-2026-0178 URL: https://rustsec.org/advisories/RUSTSEC-2026-0178 Severity: 6.9 (medium) Solution: Upgrade to >=0.7.18 ``` ## What changes are included in this PR? Upgrade the minimal version of the `tokio-postgres` dependency ## Are these changes tested? Existing tests ## Are there any user-facing changes? None Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#22820. ## Rationale for this change Sliding `SUM(DISTINCT)` only supports `Int64`, but it was routed through the wider `SUM` type dispatch path. This made unsupported types fail with a less clear accumulator error. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? This PR adds an explicit `Int64` gate for sliding `SUM(DISTINCT)`. Unsupported types now return a clear feature error that names the operation and type. The existing `Int64` path is unchanged. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No public API change <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? - Part of apache#22330. This is the first of the per-method PRs that issue describes. It plumbs `placement` only; the remaining defaulted methods follow separately, so the umbrella issue stays open. ## Rationale for this change `FFI_ScalarUDF` (`datafusion/ffi/src/udf/mod.rs`) carried no function pointer for `placement`, and `ForeignScalarUDF` did not override it, so a producer's override of `ScalarUDFImpl::placement` (default body at `datafusion/expr/src/udf.rs:1028`) was dropped on the consumer side and every foreign UDF fell back to `KeepInPlace`. A UDF loaded over FFI never delivered its leaf-pushdown hint to the optimizer. ## What changes are included in this PR? - New `FFI_ExpressionPlacement` enum bridge in `datafusion/ffi/src/placement.rs`, in the shape of `FFI_Volatility`: `#[repr(u8)]` with `From` impls both ways and a round-trip test over every variant. - A `placement` function pointer on `FFI_ScalarUDF`, populated in the `From<Arc<ScalarUDF>>` constructor, with `placement_fn_wrapper` on the producer side and a forwarding `ForeignScalarUDF::placement` on the consumer side. `placement` is infallible, so the pointer returns the enum directly rather than `FFI_Result`. Adding a field to the `#[repr(C)]` struct changes its layout, so this is an API change and should carry the `api change` label (I can't add it myself). It targets `main` and should not be back-ported to a release branch. `display_name` is also on the issue's list, but it has been deprecated since 50.0.0, so it should be dropped from the gap list rather than plumbed. I have left it and the remaining methods to follow-up PRs. ## Are these changes tested? Yes. - Unit: a round-trip test over all four `ExpressionPlacement` variants, plus a forced-foreign test (`mock_foreign_marker_id`) using a UDF whose `placement` override depends on its arguments. The assertions cover ordered, reordered, and empty argument slices, so argument marshalling is checked, not just the return value. - Integration: `tests/ffi_udf.rs` loads the UDF from the real cdylib and asserts the override survives the boundary, which is the surface a layout change needs. Run with `cargo test -p datafusion-ffi` and `cargo test -p datafusion-ffi --features integration-tests`. ## Are there any user-facing changes? A `placement` override on a `ScalarUDFImpl` is now preserved across the FFI boundary instead of being silently replaced by the default. This is an ABI change to `FFI_ScalarUDF`; consumers must be recompiled against the new layout. --------- Signed-off-by: Amogh Ramesh <ramogh2404@gmail.com>
## Which issue does this PR close? - Closes apache#22808. ## Rationale for this change Follow-up to apache#22791, as suggested in review by @2010YOUY01. That PR fixed a double-decrement bug where `EmitLeftUnmatched` did two jobs at once — deciding whether a partition emits unmatched-left rows (which decrements the shared `probe_threads_counter`) and performing the emit. Because the state is re-enterable (a ready batch can be flushed before the state advances to `Done`), the counter could be decremented twice, driving it to zero before all partitions finished probing and emitting spurious NULL-padded rows. apache#22791 patched this with a `probe_completed_reported` guard flag. This refactor makes "decrement exactly once per probe stream" a structural property of the state graph rather than a runtime guard, so the inner logic is easier to follow and the bug is harder to reintroduce. ## What changes are included in this PR? Restructures the state machine from `FetchingRight → EmitLeftUnmatched` to `FetchingRight → ProbeEnd → EmitLeftUnmatched`: - Adds a dedicated `ProbeEnd` state, entered exactly once per left chunk when the right side is exhausted. It owns the single `report_probe_completed()` call and records whether this stream is the unmatched-left emitter. - Replaces the `probe_completed_reported` guard flag with an `is_unmatched_left_emitter` field that `EmitLeftUnmatched` only reads. - Removes the per-chunk flag reset in the memory-limited path (the decision is recomputed in `ProbeEnd` for each chunk) and reverts the `Arc::clone` workaround apache#22791 needed in `process_left_unmatched`. - Updates the state-transition doc graph and arm comments. No behavior change is expected. ## Are these changes tested? Yes — covered by existing tests: - All 42 `nested_loop_join` unit tests and the full `datafusion-physical-plan` suite pass. - `joins.slt` sqllogictests pass (including the multi-partition LEFT JOIN regression test added in apache#22791). - 41 `join_fuzz` tests (`cargo test --features extended_tests`) comparing `NestedLoopJoinExec` against `HashJoinExec` across every join type, filtered and unfiltered, with a multi-partition probe side — the exact scenario class of the original bug — pass. - `cargo fmt` and `cargo clippy --all-targets --all-features -- -D warnings` are clean. ## Are there any user-facing changes? No.
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#22896. ## Rationale for this change `TRY_CAST` should return NULL on cast failure, but overflowing date/timestamp casts returned errors. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Make scalar temporal overflow checks respect CastOptions.safe. - Skip DataFusion’s array pre-check for safe casts so Arrow can return NULLs. - Add regression tests. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes: ```bash cargo test -p datafusion-common timestamp_overflow_returns cargo test -p datafusion-expr-common timestamp_array_to_timestamp_overflow cargo test --test sqllogictests -- datetime/timestamps.slt ``` <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? Yes. TRY_CAST for overflowing date/timestamp casts now returns NULL; regular CAST still errors. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> ## Known Limitation This PR does not add Date array-path coverage yet. For example: ```sql SELECT TRY_CAST(d AS TIMESTAMP(9)) FROM (VALUES (DATE '3000-01-01')) t(d); ``` This depends on the upstream Arrow fix in apache/arrow-rs#9825. Once DataFusion updates to an Arrow version containing that fix, we can add this regression test.
…ing (apache#22862) ## Which issue does this PR close? - Closes apache#22861. ## Rationale for this change When using DataFusion comet I noticed that my hash join operator was failing with the following error: `Failed to acquire 142606336 bytes where 17142251456 bytes already reserved and the fair limit is 17179869184 bytes, 4 registered`. Looking into this more, DataFusion asks to reserve memory for each batch (by default 8192 rows) of the build side of a hash join - and tries to reserve (without actually allocating it) num_batches * batch_size. This is problematic when these are batches are zero-copy slices of a larger batch (e.g. GroupedHashAggregateStream), since the slice size is evaluated to be the size of the larger buffer. This is because the reference to the slice actually keeps the entire buffer from being freed. DataFusion doesn't overallocate memory (the underlying data is the same), but it does over-request it (in the centralized accounting system), which can lead to these "ResourcesExhausted" exceptions. ## What changes are included in this PR? In this change, we keep track of all of the buffers that we've already counted via a set of pointers. This way, we don't redundantly request memory for the whole arrow buffer for each sub-slice of it. We choose this approach as opposed to just requesting a smaller amount of memory per batch, because as mentioned before, the pointer to each batch technically keeps the entire arrow-buffer from being freed. ## Are these changes tested? The new hash join test fails on main with ResourcesExhausted and passes with this change. ## Are there any user-facing changes? No breaking changes. Adds a new public helper count_record_batch_memory_size to datafusion-common. Co-authored-by: Jordan Epstein <jordan.epstein@imc.com>
…atch (apache#22852) ## Which issue does this PR close? - Closes apache#22849 - A related cross-partition starvation case is tracked separately in apache#22874 and addressed by an upcoming follow-up PR — see [discussion](apache#22852 (comment)) for details ## Rationale for this change `TopK::insert_batch` short-circuits when the heap's dynamic filter rejects every row in a batch: ```rust if !filter.has_true() { // nothing to filter, so no need to update return Ok(()); } ``` The early-exit check `attempt_early_completion(&batch)` lives later in the same function, gated on `replacements > 0`. So a batch that the filter rejects entirely bypasses the check. The heap's dynamic filter is derived from the heap's worst row (via `update_filter`). A batch whose rows all come from a strictly worse sort prefix is exactly the batch the filter rejects entirely — i.e. the very signal `attempt_early_completion` is designed to detect ("the next batch is past the heap's boundary, we can stop") is what causes the function to short-circuit *before* the check runs. This is a feature-interaction regression between two PRs that were both correct in isolation. The `attempt_early_completion` mechanism was added by apache#15563 (closing apache#15529). At the time, there was no heap-derived dynamic filter on TopK, so the only sensible call site was right after a successful heap insertion. Two months later, apache#15770 added the dynamic-filter pushdown for TopK sorts, introducing the `!filter.has_true()` short-circuit. The two features address different problems and the new short-circuit didn't connect to the existing prefix-completion check — which is how this gap opened up. **Consequence**: on a TopK over an input ordered on the sort prefix, `finished = true` is never set once the heap stabilizes. Since `finished` is the signal `SortExec` uses to stop pulling from its input (via `Poll::Ready(None)` from the TopK stream, which cascades into dropping the source stream), the source keeps being polled long past the point where no further row can improve the heap. The LIMIT optimization effectively degrades to "heap saves memory but reads everything"; sources with cancellable streams (e.g. networked sources) never receive the cancellation signal. ## What changes are included in this PR? Single behavioral change in `datafusion/physical-plan/src/topk/mod.rs`: call `attempt_early_completion(&batch)` immediately before the `return Ok(())` in the `!filter.has_true()` branch. Why this scope, not a broader restructuring: - The existing `attempt_early_completion` call inside `if replacements > 0` is load-bearing for a related case: a batch containing a mix of "still valuable" rows and "past the boundary" rows. The existing `test_try_finish_marks_finished_with_prefix` test covers this case — Batch 2 with `a=[2,3], b=[10,20]` against a heap where `heap.max.a = 2`; the `(2, 10)` row must be inserted before the check on the `(3, 20)` last row triggers. Moving the call earlier would skip the insertion of valuable rows and break that test. - The bug is specifically that the *short-circuit* path doesn't call the check. The fix targets exactly that path. - A related but separate gap is not addressed here: when `filter.has_true() == true` but `replacements == 0` (the filter accepts some rows but `find_new_topk_items` ends up inserting none of them), the existing call inside `if replacements > 0` is also skipped. This requires a divergence between the heap's filter predicate and the row-byte comparison used inside `find_new_topk_items`, which shouldn't normally happen (the filter is derived from the heap's worst row using the same comparator). A deterministic synthetic repro would likely require concurrent heap updates from sibling partitions or boundary-value edge cases (NaN/NULL semantics, type coercion). Happy to send a follow-up if reviewers want it covered; the workload that motivated this fix was the filter-rejection case empirically. ## Are these changes tested? Yes. Added a regression test `test_try_finish_fires_when_filter_rejects_entire_batch`. The assertion target is `topk.finished` — the flag that signals "stop pulling from the source" to upstream consumers (read by `TopKExec::poll_next` to emit `Poll::Ready(None)`). Asserting that the flag transitions on the fully-filter-rejected batch is equivalent to asserting that the source-stopping mechanism activates. - Builds a TopK over a `(a, b)` sort with prefix `a`, k=3. - Inserts a batch that fills the heap with rows from `a ∈ {1, 2}`; `update_filter` tightens the filter to `a < 2 OR (a = 2 AND b < 30)`. - Inserts a second batch with all rows at `a = 3` — filter rejects every row. - Without the fix: `insert_batch` short-circuits, `topk.finished` stays `false`. Test fails. - With the fix: `attempt_early_completion` fires (last-row prefix `a = 3` > heap.max prefix `a = 2`), `topk.finished` becomes `true`. Test passes. The test also asserts the emitted top-K is unchanged from after batch 1, confirming no candidate row was incorrectly excluded by the early bail. All 28 existing `topk::` tests continue to pass (including `test_try_finish_marks_finished_with_prefix`, which exercises the mixed-prefix case). ## Are there any user-facing changes? No public API or output changes. The fix only changes when TopK marks itself `finished = true` — specifically, it now fires `attempt_early_completion` for batches that are entirely rejected by the heap's dynamic filter, where previously it would silently skip the check. Output of TopK is unchanged; only the early-exit behavior improves. --------- Co-authored-by: Gabriel <45515538+gabotechs@users.noreply.github.com>
…locations (apache#22918) ## Which issue does this PR close? - Closes None. ## Rationale for this change This pr simplifies heap size estimation by using a macro for types that own no heap allocations. This removes a lot of redundant code. ## What changes are included in this PR? See above. ## Are these changes tested? Yes, previous tests are passing and more tests are added. ## Are there any user-facing changes? No.
…n to the new hash aggregation impl (apache#22899) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> Part of apache#22710 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> See issue for the background, this PR forward ports below optimization to the rewritten hash aggregation - apache#11627 After this migration, the performance is back, so this PR also changes the temporary configuration `datafusion.execution.enable_migration_aggregate` default to `true` -- the new path will be used by default. Local Clickbench_partitioned result (see `benchmarks/` for details), on M4 Pro MacBook ``` -------------------- Benchmark clickbench_partitioned.json -------------------- ┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ split-aggr-skip-partial ┃ Change ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ QQuery 0 │ 0.68 ms │ 0.70 ms │ no change │ │ QQuery 1 │ 7.66 ms │ 7.48 ms │ no change │ │ QQuery 2 │ 25.79 ms │ 25.72 ms │ no change │ │ QQuery 3 │ 22.25 ms │ 22.10 ms │ no change │ │ QQuery 4 │ 182.82 ms │ 188.57 ms │ no change │ │ QQuery 5 │ 213.57 ms │ 212.69 ms │ no change │ │ QQuery 6 │ 0.66 ms │ 0.69 ms │ no change │ │ QQuery 7 │ 8.54 ms │ 8.49 ms │ no change │ │ QQuery 8 │ 245.27 ms │ 246.04 ms │ no change │ │ QQuery 9 │ 323.81 ms │ 323.68 ms │ no change │ │ QQuery 10 │ 48.95 ms │ 48.70 ms │ no change │ │ QQuery 11 │ 57.73 ms │ 57.05 ms │ no change │ │ QQuery 12 │ 211.82 ms │ 210.91 ms │ no change │ │ QQuery 13 │ 298.06 ms │ 302.46 ms │ no change │ │ QQuery 14 │ 219.03 ms │ 217.94 ms │ no change │ │ QQuery 15 │ 219.24 ms │ 216.60 ms │ no change │ │ QQuery 16 │ 485.78 ms │ 493.53 ms │ no change │ │ QQuery 17 │ 500.92 ms │ 487.31 ms │ no change │ │ QQuery 18 │ 1087.29 ms │ 1051.08 ms │ no change │ │ QQuery 19 │ 19.10 ms │ 19.45 ms │ no change │ │ QQuery 20 │ 453.62 ms │ 458.61 ms │ no change │ │ QQuery 21 │ 454.90 ms │ 459.08 ms │ no change │ │ QQuery 22 │ 829.91 ms │ 847.96 ms │ no change │ │ QQuery 23 │ 2561.67 ms │ 2619.03 ms │ no change │ │ QQuery 24 │ 31.76 ms │ 31.78 ms │ no change │ │ QQuery 25 │ 86.63 ms │ 89.67 ms │ no change │ │ QQuery 26 │ 31.37 ms │ 32.67 ms │ no change │ │ QQuery 27 │ 544.97 ms │ 553.90 ms │ no change │ │ QQuery 28 │ 1822.22 ms │ 1877.44 ms │ no change │ │ QQuery 29 │ 27.76 ms │ 29.00 ms │ no change │ │ QQuery 30 │ 211.00 ms │ 217.48 ms │ no change │ │ QQuery 31 │ 206.02 ms │ 211.34 ms │ no change │ │ QQuery 32 │ 676.20 ms │ 724.32 ms │ 1.07x slower │ │ QQuery 33 │ 1144.96 ms │ 1161.21 ms │ no change │ │ QQuery 34 │ 1141.98 ms │ 1147.83 ms │ no change │ │ QQuery 35 │ 209.49 ms │ 217.06 ms │ no change │ │ QQuery 36 │ 44.38 ms │ 44.10 ms │ no change │ │ QQuery 37 │ 24.15 ms │ 24.57 ms │ no change │ │ QQuery 38 │ 29.67 ms │ 30.00 ms │ no change │ │ QQuery 39 │ 87.68 ms │ 88.80 ms │ no change │ │ QQuery 40 │ 8.57 ms │ 8.95 ms │ no change │ │ QQuery 41 │ 8.62 ms │ 8.38 ms │ no change │ │ QQuery 42 │ 7.42 ms │ 7.20 ms │ no change │ └───────────┴────────────┴─────────────────────────┴──────────────┘ ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> This PR is easier to read commit-by-commit. 1. Cleanup the state machine in hash aggregation with typestate pattern 2. Move common util for partial hash aggregation skip from `aggregates/row_hash.rs` -> `aggregates/utils.rs` 3. Implement the same optimization to the migrated aggregation 4. Set configuration `enable_migration_aggregate` default to true ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 5. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests + new UT ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No
…ream` (apache#22953) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> Part of apache#22710 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> The goal is after we have fully migrated from the old `row_hash.rs`, the existing UTs should be kept. Specifically, all tests that include `GroupedHashAggregateStream` There are 3 previous PRs for the migration have been merged, some existing UTs are applicable to them, this PR migrated those tests to the new implementation. The test migration includes: 1. copy and paste test case 2. Change `GroupedHashAggregateStream` to `PartialHashAggregateStream` (or other stream in new impl) 3. Left a comment on the migrated test case, so in the final delete move it's more clear which tests have already been moved. This PR moved 2 applicable UTs, and updated the comments for all the tests moved previously. (Just some random thoughts, in general I don't think it's a good idea to write tests against low-level utilities like `GroupedHashAggregateStream`, all tests should better be at SQL level, or at least at `ExecutionPlan` level, so their test goal are more likely to survive refactors) ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 5. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…ache#22816) ## Which issue does this PR close? - Closes apache#22775. ## Rationale for this change the `opt_filter` on `GroupsAccumulator::merge_batch` is a dead parameter. Aggregate `FILTER` clauses only apply to raw input rows in the update phase (`update_batch`). `merge_batch` combines already pre-aggregated states, so there is no per-row filtering to do — `opt_filter` is meaningless there. The code confirms this: - The only production caller (`row_hash.rs`) always passed `None`. - Existing implementations already ignored it — e.g. `correlation.rs` asserted `opt_filter.is_none()`, and Spark `avg` used `_opt_filter`. ## What changes are included in this PR? - Removed `opt_filter` from `merge_batch` in the trait and all implementations (built-in aggregates, `physical-expr-common`, `functions-aggregate-common`, Spark, and FFI). - Updated the trait docs to say `merge_batch` has no `opt_filter` because filtering happens in the update phase. - Changed the group zero-init path in `row_hash.rs` to always use `update_batch` with an all-false filter instead of branching to `merge_batch`. `update_batch` always takes raw argument types (what `aggregate_arguments` provides), and since every row is filtered out the data never matters — this is simpler and more correct. - Updated all call sites and tests. ## Are these changes tested? Yes. Existing aggregate tests cover this and were updated to the new signature. The `first_last` tests were adjusted (with comments) to match the merge behavior without a filter, and the FFI and Spark tests were updated too. ## Are there any user-facing changes? Yes — this is a breaking change to the public `GroupsAccumulator` trait: `opt_filter` is removed from `merge_batch`. Custom implementations and direct callers must update their signatures.
…ntations (apache#22913) ## Which issue does this PR close? - Closes apache#22912. ## Rationale for this change This change makes testing null_aware behavior easier, and also makes the performance of various joins clearer - null_aware joins do extra work. This was originally part apache#21585, but it seems like there is a bunch of activity around null-aware joins, so I figured its worth splitting out. ## What changes are included in this PR? Add a `null_aware` indication to relevant Display implementations when appropriate. ## Are these changes tested? SLT tests ## Are there any user-facing changes? Display only --------- Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
Bumps [pyjwt](https://github.com/jpadilla/pyjwt) from 2.12.0 to 2.13.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/jpadilla/pyjwt/releases">pyjwt's releases</a>.</em></p> <blockquote> <h2>2.13.0</h2> <h1>PyJWT 2.13.0 — Security Release</h1> <p>This release bundles five security fixes plus three additional hardening / spec-compliance changes. We recommend all users upgrade.</p> <h2>Security</h2> <ul> <li> <p><strong><a href="https://github.com/jpadilla/pyjwt/security/advisories/GHSA-xgmm-8j9v-c9wx"><code>GHSA-xgmm-8j9v-c9wx</code></a> — JWK JSON accepted as HMAC secret (algorithm confusion).</strong> <code>HMACAlgorithm.prepare_key</code> previously rejected PEM- and SSH-formatted asymmetric keys but did not catch a JWK passed as a raw JSON string. In a verifier configured with both symmetric and asymmetric algorithms in <code>algorithms=[…]</code> and a raw-JSON JWK as the key, an attacker could forge HS256 tokens using the JWK text as the HMAC secret. The guard has been extended to reject any JWK-shaped JSON. <em>Reported by <a href="https://github.com/aradona91"><code>@aradona91</code></a>.</em></p> </li> <li> <p><strong><a href="https://github.com/jpadilla/pyjwt/security/advisories/GHSA-jq35-7prp-9v3f"><code>GHSA-jq35-7prp-9v3f</code></a> — Algorithm allow-list bypass with <code>PyJWK</code> / <code>PyJWKClient</code>.</strong> When verifying with a <code>PyJWK</code>, the caller's <code>algorithms=[…]</code> allow-list was checked against the token header <code>alg</code> as a string only; actual verification used the algorithm bound to the <code>PyJWK</code>. An attacker who controlled a registered JWKS key could sign with one algorithm and advertise another on the header. PyJWT now requires the token header <code>alg</code> to match the <code>PyJWK</code>'s algorithm before verification. <em>Reported by <a href="https://github.com/sushi-gif"><code>@sushi-gif</code></a>.</em></p> </li> <li> <p><strong><a href="https://github.com/jpadilla/pyjwt/security/advisories/GHSA-w7vc-732c-9m39"><code>GHSA-w7vc-732c-9m39</code></a> — DoS via base64 decode of unused payload segment when <code>b64=false</code>.</strong> For detached-payload JWS (<code>b64=false</code>), the compact-form payload segment was base64-decoded before being discarded in favor of the caller-supplied <code>detached_payload</code>. An attacker could inflate the unused segment to force CPU + memory cost without holding a valid signature. The segment is now required to be empty per RFC 7515 Appendix F, and is no longer decoded. <em>Reported by <a href="https://github.com/thesmartshadow"><code>@thesmartshadow</code></a>.</em></p> </li> <li> <p><strong><a href="https://github.com/jpadilla/pyjwt/security/advisories/GHSA-993g-76c3-p5m4"><code>GHSA-993g-76c3-p5m4</code></a> — <code>PyJWKClient</code> accepts non-HTTP(S) URIs.</strong> <code>PyJWKClient.fetch_data</code> passed its URI to <code>urllib.request.urlopen</code>, which by default also handles <code>file://</code>, <code>ftp://</code>, and <code>data:</code> schemes. An application that fed an attacker-influenced URI into <code>PyJWKClient</code> could be coerced into reading local files or reaching other unintended schemes. <code>PyJWKClient</code> now rejects any URI whose scheme isn't <code>http</code> or <code>https</code>. <em>Reported by <a href="https://github.com/KEIJOT"><code>@KEIJOT</code></a>.</em></p> </li> <li> <p><strong><a href="https://github.com/jpadilla/pyjwt/security/advisories/GHSA-fhv5-28vv-h8m8"><code>GHSA-fhv5-28vv-h8m8</code></a> — <code>PyJWKClient</code> cache wiped on fetch error.</strong> A <code>finally</code>-block <code>put(jwk_set=None)</code> cleared the JWK Set cache whenever a fetch raised, turning a transient JWKS-endpoint outage into application-wide auth failure. The cache write was moved into the success path; transient errors no longer evict valid cached keys. <em>Reported by <a href="https://github.com/eddieran"><code>@eddieran</code></a>.</em></p> </li> </ul> <h2>Fixed</h2> <ul> <li>Reject empty HMAC keys outright in <code>HMACAlgorithm.prepare_key</code> with <code>InvalidKeyError</code> instead of accepting them with only a warning. Defends against the <code>os.getenv("JWT_SECRET", "")</code> footgun. <em>Thanks to <a href="https://github.com/SnailSploit"><code>@SnailSploit</code></a> and <a href="https://github.com/spartan8806"><code>@spartan8806</code></a> for the reports.</em></li> <li>Forward per-call <code>options</code> (including <code>enforce_minimum_key_length</code>) from <code>PyJWT.decode</code> through to <code>PyJWS._verify_signature</code>. The option was previously silently dropped between the two layers, so it only took effect when set on the <code>PyJWT</code> instance. <em>Thanks to <a href="https://github.com/WLUB"><code>@WLUB</code></a> for the report.</em></li> <li><strong>RFC 7797 §3 compliance for <code>b64=false</code>:</strong> the encoder now auto-adds <code>"b64"</code> to <code>crit</code>, and the decoder rejects tokens that set <code>b64=false</code> without listing it in <code>crit</code>. <em>Thanks to <a href="https://github.com/MachineLearning-Nerd"><code>@MachineLearning-Nerd</code></a> for the report.</em></li> </ul> <h2>Changed</h2> <ul> <li>Migrate the <code>dev</code>, <code>docs</code>, and <code>tests</code> package extras to dependency groups, by <a href="https://github.com/kurtmckee"><code>@kurtmckee</code></a> in <a href="https://redirect.github.com/jpadilla/pyjwt/pull/1152">#1152</a>.</li> </ul> <h2>Upgrade notes</h2> <p>Most fixes are invisible to correctly-configured callers. A few behavioral changes you may encounter:</p> <ul> <li><strong>Empty HMAC keys now raise.</strong> If your app passed <code>""</code> or <code>b""</code> as a secret (often via a missing env var, e.g. <code>os.getenv("JWT_SECRET", "")</code>), <code>encode</code>/<code>decode</code> will now raise <code>InvalidKeyError</code>. This is the intended behavior — fix the configuration.</li> <li><strong><code>PyJWK</code> decoding now requires the token's <code>alg</code> to match the JWK's algorithm.</strong> Previously a mismatch was silently honored if the header <code>alg</code> appeared in the allow-list. Tokens that relied on this mismatch will now fail with <code>InvalidAlgorithmError</code>.</li> <li><strong><code>PyJWKClient</code> now rejects non-HTTP(S) URIs at construction time.</strong> Tests or dev environments that fetched JWKS from <code>file://</code> URIs need to switch to a local HTTP server or load the JWKS by other means (e.g. construct <code>PyJWKSet.from_dict(...)</code> directly).</li> <li><strong><code>b64=false</code> tokens are now strictly RFC 7515 / 7797 compliant.</strong> Tokens with a non-empty compact-form payload segment, or that omit <code>"b64"</code> from <code>crit</code>, will be rejected. PyJWT-produced tokens always satisfy both invariants, so round-trips through PyJWT are unaffected.</li> <li><strong><code>enforce_minimum_key_length</code> set per-call now takes effect.</strong> Callers who passed <code>options={"enforce_minimum_key_length": True}</code> to <code>jwt.decode()</code> previously got no enforcement; they will now get <code>InvalidKeyError</code> on undersized keys, as documented.</li> </ul> <p><strong>Full changelog:</strong> <a href="https://github.com/jpadilla/pyjwt/compare/2.12.1...2.13.0">https://github.com/jpadilla/pyjwt/compare/2.12.1...2.13.0</a></p> <h2>2.12.1</h2> <h2>What's Changed</h2> <ul> <li>Add typing_extensions dependency for Python < 3.11 by <a href="https://github.com/jpadilla"><code>@jpadilla</code></a> in <a href="https://redirect.github.com/jpadilla/pyjwt/pull/1151">jpadilla/pyjwt#1151</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/jpadilla/pyjwt/compare/2.12.0...2.12.1">https://github.com/jpadilla/pyjwt/compare/2.12.0...2.12.1</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/jpadilla/pyjwt/blob/master/CHANGELOG.rst">pyjwt's changelog</a>.</em></p> <blockquote> <h2><code>v2.13.0 <https://github.com/jpadilla/pyjwt/compare/2.12.1...2.13.0></code>__</h2> <p>Security</p> <pre><code> - Reject JWK JSON documents passed as raw HMAC secrets in ``HMACAlgorithm.prepare_key`` to close an algorithm-confusion gap that the existing PEM/SSH guard did not cover. Reported by @aradona91 in `GHSA-xgmm-8j9v-c9wx <https://github.com/jpadilla/pyjwt/security/advisories/GHSA-xgmm-8j9v-c9wx>`__. - Bind the JWT header ``alg`` to ``PyJWK.algorithm_name`` during verification so the caller's ``algorithms=[...]`` allow-list cannot be bypassed when decoding with a ``PyJWK`` / ``PyJWKClient`` key. Reported by @sushi-gif in `GHSA-jq35-7prp-9v3f <https://github.com/jpadilla/pyjwt/security/advisories/GHSA-jq35-7prp-9v3f>`__. - Reject non-``http(s)`` URI schemes in ``PyJWKClient`` so attacker- influenced URIs cannot read local files or reach unintended schemes via urllib's default ``file://`` / ``ftp://`` / ``data:`` handlers. Reported by @KEIJOT in `GHSA-993g-76c3-p5m4 <https://github.com/jpadilla/pyjwt/security/advisories/GHSA-993g-76c3-p5m4>`__. - Preserve the cached JWK Set on fetch errors in ``PyJWKClient.fetch_data``. The previous ``finally``-block ``put(None)`` pattern cleared the cache on any transient outage, turning one bad JWKS request into application- wide auth failure. Reported by @eddieran in `GHSA-fhv5-28vv-h8m8 <https://github.com/jpadilla/pyjwt/security/advisories/GHSA-fhv5-28vv-h8m8>`__. - Skip the unconditional base64 decode of the compact-form payload segment when ``b64=false`` is set in the protected header, and require that segment to be empty (RFC 7515 Appendix F detached form). Closes an unauthenticated DoS amplifier. Reported by @thesmartshadow in `GHSA-w7vc-732c-9m39 <https://github.com/jpadilla/pyjwt/security/advisories/GHSA-w7vc-732c-9m39>`__. <p>Fixed</p> <pre><code> - Reject empty HMAC keys outright in ``HMACAlgorithm.prepare_key`` with ``InvalidKeyError`` instead of accepting them with only a warning. Thanks to @SnailSploit and @spartan8806 for independently flagging the footgun. - Forward per-call ``options`` (including ``enforce_minimum_key_length``) from ``PyJWT.decode`` through to ``PyJWS._verify_signature`` so the option actually takes effect when set at the call site rather than only on the ``PyJWT`` instance. Thanks to @WLUB for the report. - RFC 7797 §3 compliance for ``b64=false``: the encoder now auto-adds ``&quot;b64&quot;`` to the ``crit`` header parameter, and the decoder rejects tokens that set ``b64=false`` without listing it in ``crit``. Thanks to @MachineLearning-Nerd for the report. Changed </code></pre> <ul> <li>Migrate the <code>dev</code>, <code>docs</code>, and <code>tests</code> package extras to dependency groups by <a href="https://github.com/kurtmckee"><code>@kurtmckee</code></a> in <code>[apache#1152](jpadilla/pyjwt#1152) &lt;https://github.com/jpadilla/pyjwt/pull/1152&gt;</code>__</li> </ul> <p><code>v2.12.1 &lt;https://github.com/jpadilla/pyjwt/compare/2.12.0...2.12.1&gt;</code>__ </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/jpadilla/pyjwt/commit/7144e4534c34810f4525dc4578a32addd8212cff"><code>7144e45</code></a> Apply ruff format</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/d2f4bec4963897c0ef96ef64a875894f2c8542ab"><code>d2f4bec</code></a> Restore <code>cast()</code> calls with cross-version <code>type: ignore</code> for <code>prepare_key</code></li> <li><a href="https://github.com/jpadilla/pyjwt/commit/22f478cebddd8294259c30f037ecb92b0b348774"><code>22f478c</code></a> Remove redundant casts in <code>RSAAlgorithm.prepare_key</code> and `ECAlgorithm.prepare...</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/95791b1759b8aa4f2203575d344d5c78564cdc81"><code>95791b1</code></a> Bundle security fixes and hardening into 2.13.0</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/dcc27a9d3182a2349c30b160758785c6ce7a6508"><code>dcc27a9</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/jpadilla/pyjwt/issues/1155">#1155</a>)</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/9d08a9a1896845ed8eaf88e6f6ac61e5800c3e7a"><code>9d08a9a</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/jpadilla/pyjwt/issues/1146">#1146</a>)</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/b87c10014d4109f0214fea188d00faaaf8a80e64"><code>b87c100</code></a> Bump codecov/codecov-action from 5 to 6 (<a href="https://redirect.github.com/jpadilla/pyjwt/issues/1154">#1154</a>)</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/40e3147eb5f790d8d041772e5fc00728a176c812"><code>40e3147</code></a> Migrate development extras to dependency groups (<a href="https://redirect.github.com/jpadilla/pyjwt/issues/1152">#1152</a>)</li> <li><a href="https://github.com/jpadilla/pyjwt/commit/a4e1a3d1218b01c5806420b8f16d9308ac4adc30"><code>a4e1a3d</code></a> Add typing_extensions dependency for Python < 3.11 (<a href="https://redirect.github.com/jpadilla/pyjwt/issues/1151">#1151</a>)</li> <li>See full diff in <a href="https://github.com/jpadilla/pyjwt/compare/2.12.0...2.13.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/datafusion/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? - Closes apache#22932 ## Rationale for this change `depcheck` stopped working because an underlying dependency released a non-semver compatible change (`time 0.3.48`, not going to link to the issue there because people are already spamming it). Because there was no lockfile, every run pulled the most recent versions for all dependencies, which don't currently compile. ## What changes are included in this PR? 1. Add lockfile to `depcheck` 2. Make sure to run `depcheck` with `--lock` in CI ## Are these changes tested? Tested locally ## Are there any user-facing changes? None Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
## Which issue does this PR close? - Closes apache#19057. ## Rationale for this change A binary gcd and lcm UDF in the datafusion-functions crate supports only Int64, but not Decimals. Adding missing support for decimals. ## What changes are included in this PR? 1. Updated gcd and lcm functions to add decimal support. The integer path is more performant and stays intact. For decimals, the Euclidean algorithm is used for GCD 2. Added coercion rules: casting to decimals if any argument is decimal; otherwise, stay with ints as before 3. Common functionality extracted to `common.rs` to avoid inter-UDF dependency 4. In order to use `calculate_binary_math` for Decimals, updated it to accept a target type instead of raw `Decimal128Type::DATA_TYPE` - it causes scaling issues for these UDFs, see apache#19621 A bit more on (4). The driving force is this failing example: ```sql query R select gcd(2::decimal(38, 0), 3::decimal(38, 0)); ---- 1 ``` Previously in apache#19874, I suggested a more complicated solution to extend `calculate_binary_math`. However, it only affected gcd/lcm and could be considered overkill. This PR extends these functions with an extra parameter `cast_target` for `calculate_binary_decimal_math` to perform a proper cast to the actual type used, rather than to the default `Decimal128Type::DATA_TYPE` - it is much lighter. ## Are these changes tested? - Added unit test for UDFs with decimals for array and scalar paths - Added unit tests for the gcd/lcm math itself - Added new SLT tests for decimals ## Are there any user-facing changes? No
… files (apache#22604) ## Which issue does this PR close? - Part of apache#20135 ## Rationale for this change This PR includes the "front end" side of @mbutrovich's apache#22026, bridging the last mile to allow users to query file row indexes. ## What changes are included in this PR? 1. A new Scalar UDF `file_row_index`, following apache#20071's example. The function returns 0-based row indexes for Parquet scans. 2. Expands the row-filter PushdownChecker to also check if the predicate contains the new function, denying it from being pushed down if it does. 3. I've added a couple of utilities to find or rewrite ScalarUDF instances in physical expressions trees, I've seen @alamb point this mistake out in multiple PRs (including [here](apache#20071 (comment))). They can also be used in apache#20071. They are currently in `schema_rewriter.rs` which was the best place I could think of, but maybe they should be move elsewhere. 4. A dedicated rewrite function for `file_row_index`, which turns it into a `Cast(Column(...))`, which is required to return Int64 values. 5. In `ParquetSource::try_pushdown_projection`, we look for `FileRowIndexFunc`, and if it exists we rewrite it and the source's table schema. ## Are these changes tested? In addition to individual unit tests, I've added a new SLT file (`file_row_index.slt`) that tests for the following cases: 1. Querying `file_row_index` from a table backed by multiple files 2. Filtering on `file_row_index` when its part of the projection 3. Filtering on `file_row_index` when its **not** of the projection, when filter pushdown is either enabled or disabled (this part didn't work in a previous iteration, but figured it out today). ## Are there any user-facing changes? 1. New scalar function type - `FileRowIndexFunc`/`file_row_index`, 5. Rewrite logic in `physical-expr-adapter` - `rewrite_file_row_index_expr` specifically for the new UDF, `rewrite_file_row_index_projection` to rewrite the `ProjectionExprs` and two utility functions that should make it clearer how to manipulate and find ScalarUDFs in physical expressions - `expr_references_scalar_udf` and `rewrite_scalar_udf`. --------- Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
…ion/wasmtest/datafusion-wasm-app (apache#22970) Bumps [launch-editor](https://github.com/vitejs/launch-editor) from 2.10.0 to 2.14.1. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vitejs/launch-editor/commit/3f97c648307a70e7c930ba63f040e6ba8e72bc16"><code>3f97c64</code></a> v2.14.1</li> <li><a href="https://github.com/vitejs/launch-editor/commit/0cc9550e05c35224a1f61914b9731723f78c06a7"><code>0cc9550</code></a> fix: reject UNC paths (<a href="https://redirect.github.com/vitejs/launch-editor/issues/138">#138</a>)</li> <li><a href="https://github.com/vitejs/launch-editor/commit/afd1ab907769adbb0fb76b564551f3cd24822ea6"><code>afd1ab9</code></a> ci: run tests on mac and windows (<a href="https://redirect.github.com/vitejs/launch-editor/issues/136">#136</a>)</li> <li><a href="https://github.com/vitejs/launch-editor/commit/0bfa328e4d2a70d3accd86645efde43a5a8fd931"><code>0bfa328</code></a> test: add some tests for launch-editor package (<a href="https://redirect.github.com/vitejs/launch-editor/issues/135">#135</a>)</li> <li><a href="https://github.com/vitejs/launch-editor/commit/1b006ae00061e83628c884770df4869bac481ed3"><code>1b006ae</code></a> chore: add README (<a href="https://redirect.github.com/vitejs/launch-editor/issues/134">#134</a>)</li> <li><a href="https://github.com/vitejs/launch-editor/commit/383ef26fceaaab75e4f122bf833a8d457e187272"><code>383ef26</code></a> v2.14.0</li> <li><a href="https://github.com/vitejs/launch-editor/commit/6277209316b28bd76b057a97fcdfa42da5adc181"><code>6277209</code></a> ci: harden publish settings</li> <li><a href="https://github.com/vitejs/launch-editor/commit/520b2f787af1bbe3d2333d952547fc90f2c01bb3"><code>520b2f7</code></a> fix(deps): update all non-major dependencies (<a href="https://redirect.github.com/vitejs/launch-editor/issues/129">#129</a>)</li> <li><a href="https://github.com/vitejs/launch-editor/commit/475ac662aeaa40eebe319f76acf7bd7ff2aba5d5"><code>475ac66</code></a> chore(deps): update dependency lint-staged to v17 (<a href="https://redirect.github.com/vitejs/launch-editor/issues/130">#130</a>)</li> <li><a href="https://github.com/vitejs/launch-editor/commit/247bf1dfe2c75efb50f507d33c2c5a3cbf4e20d0"><code>247bf1d</code></a> chore(deps): update dependency yorkie to v2 (<a href="https://redirect.github.com/vitejs/launch-editor/issues/131">#131</a>)</li> <li>Additional commits viewable in <a href="https://github.com/vitejs/launch-editor/compare/v2.10.0...v2.14.1">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~GitHub%20Actions">GitHub Actions</a>, a new releaser for launch-editor since your current version.</p> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/datafusion/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.7 to 48.0.1. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's changelog</a>.</em></p> <blockquote> <p>48.0.1 - 2026-06-09</p> <pre><code> * Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL 4.0.1. <p>.. _v48-0-0:</p> <p>48.0.0 - 2026-05-04<br /> </code></pre></p> <ul> <li> <p><strong>BACKWARDS INCOMPATIBLE:</strong> Support for Python 3.8 has been removed. <code>cryptography</code> now requires Python 3.9 or later.</p> </li> <li> <p><strong>BACKWARDS INCOMPATIBLE:</strong> Loading an X.509 CRL whose inner <code>TBSCertList.signature</code> algorithm does not match the outer <code>signatureAlgorithm</code> now raises <code>ValueError</code>. Previously, such CRLs were parsed successfully and only rejected during signature validation.</p> </li> <li> <p>Added support for :doc:<code>/hazmat/primitives/asymmetric/mlkem</code> and :doc:<code>/hazmat/primitives/asymmetric/mldsa</code> when using OpenSSL 3.5.0 or later, in addition to the existing AWS-LC and BoringSSL support. This means post-quantum algorithms are now available to users of our wheels.</p> <ul> <li><strong>Note:</strong> Going forward, we do not guarantee that all functionality in <code>cryptography</code> will be available when building against OpenSSL. See :doc:<code>/statements/state-of-openssl</code> for more information.</li> </ul> </li> </ul> <p>.. _v47-0-0:</p> <p>47.0.0 - 2026-04-24</p> <pre><code> * Support for Python 3.8 is deprecated and will be removed in the next ``cryptography`` release. * **BACKWARDS INCOMPATIBLE:** Support for binary elliptic curves (``SECT*`` classes) has been removed. These curves are rarely used and have additional security considerations that make them undesirable. * **BACKWARDS INCOMPATIBLE:** Support for OpenSSL 1.1.x has been removed. OpenSSL 3.0.0 or later is now required. LibreSSL, BoringSSL, and AWS-LC continue to be supported. * **BACKWARDS INCOMPATIBLE:** Dropped support for LibreSSL < 4.1. * **BACKWARDS INCOMPATIBLE:** Loading keys with unsupported algorithms or keys with unsupported explicit curve encodings now raises :class:`~cryptography.exceptions.UnsupportedAlgorithm` instead of ``ValueError``. This change affects :func:`~cryptography.hazmat.primitives.serialization.load_pem_private_key`, :func:`~cryptography.hazmat.primitives.serialization.load_der_private_key`, :func:`~cryptography.hazmat.primitives.serialization.load_pem_public_key`, :func:`~cryptography.hazmat.primitives.serialization.load_der_public_key`, and :meth:`~cryptography.x509.Certificate.public_key` when called on certificates with unsupported public key algorithms. </tr></table> </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pyca/cryptography/commit/de987ce48ccfeb1abca41efa23b2bf73ec704f74"><code>de987ce</code></a> 48.0.1 version bump and changelog (<a href="https://redirect.github.com/pyca/cryptography/issues/14996">#14996</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/8e03e30e3aae01632a697e903e3593c924f0139d"><code>8e03e30</code></a> bump for 48.0.0 release (<a href="https://redirect.github.com/pyca/cryptography/issues/14796">#14796</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/295e0d254ef31ab864730aa41312ec355416ee71"><code>295e0d2</code></a> Add AGENTS.md with CLAUDE.md symlink (<a href="https://redirect.github.com/pyca/cryptography/issues/14794">#14794</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/104a2de19e268a433e6da92be9cb872dcf0003c8"><code>104a2de</code></a> Bump BoringSSL, OpenSSL, AWS-LC in CI (<a href="https://redirect.github.com/pyca/cryptography/issues/14793">#14793</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/67ec1e51988195e17993d2edef5258b27509b926"><code>67ec1e5</code></a> call check_length early on AesSiv::encrypt (<a href="https://redirect.github.com/pyca/cryptography/issues/14792">#14792</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/b2da57a0d9e4bfd2b95364299091a18f74127b26"><code>b2da57a</code></a> changelog for mldsa/mlkem for openssl (<a href="https://redirect.github.com/pyca/cryptography/issues/14791">#14791</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/3cf44adee25c368d4a136e072fa9f80465d91eb0"><code>3cf44ad</code></a> ML-KEM OpenSSL support (<a href="https://redirect.github.com/pyca/cryptography/issues/14781">#14781</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/2e31639666766f846fbab2c605879db0fa64fe83"><code>2e31639</code></a> ML-DSA OpenSSL support (<a href="https://redirect.github.com/pyca/cryptography/issues/14773">#14773</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/5affe5a286a986fdf512c4a5cb280d28a96c10e3"><code>5affe5a</code></a> fix rust nightly clippy (<a href="https://redirect.github.com/pyca/cryptography/issues/14790">#14790</a>)</li> <li><a href="https://github.com/pyca/cryptography/commit/2e73ca448eaf64b6f0d4ffbb794cf96170cef5ec"><code>2e73ca4</code></a> bump rust-openssl dep and update EcPoint::mul_generator to mul_generator2 (<a href="https://redirect.github.com/pyca/cryptography/issues/1">#1</a>...</li> <li>Additional commits viewable in <a href="https://github.com/pyca/cryptography/compare/46.0.7...48.0.1">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/datafusion/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Which issue does this PR close? - Closes #apache#22964 ## Rationale for this change We presently allow dynamic filter pushdown to be applied to null-equal hash joins. This might result in pushing a predicate down into the probe-side plan, where the predicate will not be evaluated with the null-equal semantics that are required. Longer-term, we might consider supporting this case with the correct semantics (e.g., generate a predicate with `OR IS NULL ...`), but for now disabling pushdown for null-equal joins seems much more practical. ## What changes are included in this PR? * Disable hash join dynamic filter pushdown for null-equal joins * Add SLT test with end-to-end repro * Add unit test ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
Bumps [insta-cmd](https://github.com/mitsuhiko/insta-cmd) from 0.6.0 to 0.7.0. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/aa305fda9cbdac0a8d02f71acb4fd43bfa5b0c82"><code>aa305fd</code></a> 0.7.0 (<a href="https://redirect.github.com/mitsuhiko/insta-cmd/issues/22">#22</a>)</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/c1b0eba01e3d48aab27da4e77d5a19d7b5fff065"><code>c1b0eba</code></a> Merge pull request <a href="https://redirect.github.com/mitsuhiko/insta-cmd/issues/16">#16</a> from rcorre/main</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/2089c645757be57970eeb9c6c2a0ae3abfa60159"><code>2089c64</code></a> Merge pull request <a href="https://redirect.github.com/mitsuhiko/insta-cmd/issues/20">#20</a> from max-sixty/env-var</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/7680d5ef1cde9d1ad6b678e695880933938339d9"><code>7680d5e</code></a> Bump MSRV from 1.57.0 to 1.68.0</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/7f338386715b354ad64339a2e5fd642cc71d82ba"><code>7f33838</code></a> Bump MSRV from 1.57.0 to 1.64.0</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/8a2d01eed539c11160a1dd7a8fc6393423cf45ee"><code>8a2d01e</code></a> Fix: Distinguish env_remove() from empty string in snapshots</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/0310b186e62ef24ddd0909e497044192be334b9e"><code>0310b18</code></a> Add test demonstrating env_remove() bug</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/c6a993890292bb16b60c8418a5e3e178d1520d51"><code>c6a9938</code></a> Note that integration tests build binaries.</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/1f39ad19b3cf0faf20609097f46788b9b2d80e24"><code>1f39ad1</code></a> 0.6.0</li> <li><a href="https://github.com/mitsuhiko/insta-cmd/commit/26c42b6809da9de7f2473907b17e3954ba4d78f0"><code>26c42b6</code></a> Add windows tests (<a href="https://redirect.github.com/mitsuhiko/insta-cmd/issues/13">#13</a>)</li> <li>See full diff in <a href="https://github.com/mitsuhiko/insta-cmd/compare/0.6.0...0.7.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…<2 in /docs (apache#22974) Updates the requirements on [maturin](https://github.com/pyo3/maturin) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pyo3/maturin/releases">maturin's releases</a>.</em></p> <blockquote> <h2>v1.14.0</h2> <h2>What's Changed</h2> <ul> <li>Support parent-relative pyproject metadata in sdists by <a href="https://github.com/Copilot"><code>@Copilot</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3182">PyO3/maturin#3182</a></li> <li>Update PyPI platform tag validation by <a href="https://github.com/messense"><code>@messense</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3187">PyO3/maturin#3187</a></li> <li>maint: update setup emsdk action in generate-ci by <a href="https://github.com/ManonMarchand"><code>@ManonMarchand</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3194">PyO3/maturin#3194</a></li> <li>fix: only shim bin wheels during auditwheel repair by <a href="https://github.com/messense"><code>@messense</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3197">PyO3/maturin#3197</a></li> <li>fix: avoid editable ELF truncation from stale hardlinks by <a href="https://github.com/messense"><code>@messense</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3199">PyO3/maturin#3199</a></li> <li>Fix Pyodide Emscripten platform tags by <a href="https://github.com/messense"><code>@messense</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3191">PyO3/maturin#3191</a></li> <li>build(deps): bump openssl from 0.10.79 to 0.10.80 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/PyO3/maturin/pull/3200">PyO3/maturin#3200</a></li> <li>Use pax instead of GNU headers for tar by <a href="https://github.com/konstin"><code>@konstin</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3203">PyO3/maturin#3203</a></li> <li>feat: add default exclude <strong>pycache</strong> and *.pyc files by <a href="https://github.com/GuYeying"><code>@GuYeying</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3202">PyO3/maturin#3202</a></li> <li>Documentation: fix the update_readme script by <a href="https://github.com/ManonMarchand"><code>@ManonMarchand</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3186">PyO3/maturin#3186</a></li> <li>Fix python versions used in integration_pyo3_mixed_conda by <a href="https://github.com/ngoldbaum"><code>@ngoldbaum</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3208">PyO3/maturin#3208</a></li> <li>Add support for finding free-threaded interpreters for --find-interpreters by <a href="https://github.com/ngoldbaum"><code>@ngoldbaum</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3206">PyO3/maturin#3206</a></li> <li>build(deps): bump tar from 0.4.45 to 0.4.46 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/PyO3/maturin/pull/3209">PyO3/maturin#3209</a></li> <li>Stubs: also generate them for mixed PyO3 projects by <a href="https://github.com/Tpt"><code>@Tpt</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3211">PyO3/maturin#3211</a></li> <li>Don't depend on CFFI on PyPy by <a href="https://github.com/ngoldbaum"><code>@ngoldbaum</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3213">PyO3/maturin#3213</a></li> <li>resolve_platform_tags: remove a redundant case by <a href="https://github.com/Tpt"><code>@Tpt</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3214">PyO3/maturin#3214</a></li> <li>GeneratorOutput::additional_files remove Option by <a href="https://github.com/Tpt"><code>@Tpt</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3215">PyO3/maturin#3215</a></li> <li>Support pyo3 abi3t features on Python3.15 and PyO3 0.29 by <a href="https://github.com/ngoldbaum"><code>@ngoldbaum</code></a> in <a href="https://redirect.github.com/PyO3/maturin/pull/3113">PyO3/maturin#3113</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/GuYeying"><code>@GuYeying</code></a> made their first contribution in <a href="https://redirect.github.com/PyO3/maturin/pull/3202">PyO3/maturin#3202</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/PyO3/maturin/compare/v1.13.3...v1.14.0">https://github.com/PyO3/maturin/compare/v1.13.3...v1.14.0</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/PyO3/maturin/blob/main/Changelog.md">maturin's changelog</a>.</em></p> <blockquote> <h2>1.14.0</h2> <ul> <li>Support parent-relative pyproject metadata in sdists (<a href="https://redirect.github.com/pyo3/maturin/pull/3182">#3182</a>)</li> <li>Update PyPI platform tag validation (<a href="https://redirect.github.com/pyo3/maturin/pull/3187">#3187</a>)</li> <li>Maint: update setup emsdk action in generate-ci (<a href="https://redirect.github.com/pyo3/maturin/pull/3194">#3194</a>)</li> <li>Fix: only shim bin wheels during auditwheel repair (<a href="https://redirect.github.com/pyo3/maturin/pull/3197">#3197</a>)</li> <li>Fix: avoid editable ELF truncation from stale hardlinks (<a href="https://redirect.github.com/pyo3/maturin/pull/3199">#3199</a>)</li> <li>Fix Pyodide Emscripten platform tags (<a href="https://redirect.github.com/pyo3/maturin/pull/3191">#3191</a>)</li> <li>Use pax instead of GNU headers for tar (<a href="https://redirect.github.com/pyo3/maturin/pull/3203">#3203</a>)</li> <li>Feat: add default exclude <code>__pycache__</code> and <code>*.pyc</code> files (<a href="https://redirect.github.com/pyo3/maturin/pull/3202">#3202</a>)</li> <li>Add support for finding free-threaded interpreters for <code>--find-interpreters</code> (<a href="https://redirect.github.com/pyo3/maturin/pull/3206">#3206</a>)</li> <li>Stubs: also generate them for mixed PyO3 projects (<a href="https://redirect.github.com/pyo3/maturin/pull/3211">#3211</a>)</li> <li>Don't depend on CFFI on PyPy (<a href="https://redirect.github.com/pyo3/maturin/pull/3213">#3213</a>)</li> <li>Support pyo3 abi3t features on Python3.15 and PyO3 0.29 (<a href="https://redirect.github.com/pyo3/maturin/pull/3113">#3113</a>)</li> </ul> <h2>1.13.3</h2> <ul> <li>Fix: disable abi3 in pyo3 config for version-specific fallback builds (<a href="https://redirect.github.com/pyo3/maturin/pull/3180">#3180</a>)</li> </ul> <h2>1.13.2</h2> <ul> <li>Fix: resolve test failures in distro packaging environments (<a href="https://redirect.github.com/pyo3/maturin/pull/3129">#3129</a>)</li> <li>Fix: redirect tracing output to stderr to avoid breaking PEP 517 (<a href="https://redirect.github.com/pyo3/maturin/pull/3131">#3131</a>)</li> <li>Fix: skip interpreters with empty output for WSL2 cross-compile (<a href="https://redirect.github.com/pyo3/maturin/pull/3137">#3137</a>)</li> <li>Fix: set explicit lib_name in pyo3 config for Android abi3 cross-compilation (<a href="https://redirect.github.com/pyo3/maturin/pull/3130">#3130</a>)</li> <li>Chore: add sysconfig/cpython-freebsd-15.0-amd64.txt (<a href="https://redirect.github.com/pyo3/maturin/pull/3140">#3140</a>)</li> <li>Quote python-version in generated GitHub Actions workflow</li> <li>Update rustls-webpki</li> <li>Fix: two-phase bridge detection for conditional abi3 features (<a href="https://redirect.github.com/pyo3/maturin/pull/3144">#3144</a>)</li> <li>Update cargo-zigbuild to 0.22.2</li> <li>Update pyo3 to 0.28.3</li> <li>Treat pyo3 0.29.0+ as having Windows import lib support (raw-dylib) (<a href="https://redirect.github.com/pyo3/maturin/pull/3145">#3145</a>)</li> <li>Fix bin bindings with external shared library dependencies (<a href="https://redirect.github.com/pyo3/maturin/pull/3147">#3147</a>)</li> <li>Upgrade MSRV to 1.89.0 (<a href="https://redirect.github.com/pyo3/maturin/pull/3149">#3149</a>)</li> <li>Musllinux oci image (<a href="https://redirect.github.com/pyo3/maturin/pull/3152">#3152</a>)</li> <li>Remove Cirrus CI for FreeBSD (<a href="https://redirect.github.com/pyo3/maturin/pull/3156">#3156</a>)</li> <li>Perf: defer stage_artifact copy-back, finalize via rename when unpatched (<a href="https://redirect.github.com/pyo3/maturin/pull/3155">#3155</a>)</li> <li>Perf: eliminate stage_artifact double-copy, drop was_patched flag (<a href="https://redirect.github.com/pyo3/maturin/pull/3157">#3157</a>)</li> <li>Fix release pipeline (<a href="https://redirect.github.com/pyo3/maturin/pull/3158">#3158</a>)</li> <li>Auditwheel: copy unpatched cargo output back before in-place patching (<a href="https://redirect.github.com/pyo3/maturin/pull/3159">#3159</a>)</li> <li>Develop: fail loudly when pip leaves a stale <code>~</code> install behind (<a href="https://redirect.github.com/pyo3/maturin/issues/1922">#1922</a>) (<a href="https://redirect.github.com/pyo3/maturin/pull/3161">#3161</a>)</li> <li>Provide a link for the <code>lib.name</code> in Cargo.toml (<a href="https://redirect.github.com/pyo3/maturin/pull/3167">#3167</a>)</li> <li>Fix duplicated version in changelog (<a href="https://redirect.github.com/pyo3/maturin/pull/3171">#3171</a>)</li> <li>Switch to actions/attest from attest-build-provenance (<a href="https://redirect.github.com/pyo3/maturin/pull/3169">#3169</a>)</li> <li>Switch generation to actions/attest action, upgrade to v4 (<a href="https://redirect.github.com/pyo3/maturin/pull/3170">#3170</a>)</li> <li>Fix: avoid duplicate <code>--interpreter</code> panic in PEP 517 backend (<a href="https://redirect.github.com/pyo3/maturin/pull/3175">#3175</a>)</li> <li>Add trusted publishing options to <code>generate-ci</code> (<a href="https://redirect.github.com/pyo3/maturin/pull/3176">#3176</a>)</li> <li>Fix(sdist): handle symlinked <code>Cargo.toml</code> pointing outside project root (<a href="https://redirect.github.com/pyo3/maturin/pull/3178">#3178</a>)</li> <li>Stop install cffi for Python 3.8 in Dockerfile</li> <li>Fix: support pixi-managed virtualenvs in maturin develop (<a href="https://redirect.github.com/pyo3/maturin/pull/3165">#3165</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/PyO3/maturin/commit/eb745e08f3eb8c5499420ed1ff4bce974b2ec972"><code>eb745e0</code></a> Release v1.14.0</li> <li><a href="https://github.com/PyO3/maturin/commit/6cdd0aa1d5b1e36905132db074d1f55087c4d46b"><code>6cdd0aa</code></a> fix cargo fmt</li> <li><a href="https://github.com/PyO3/maturin/commit/e18aba60daafc8bf50979a10093373becd85f1d9"><code>e18aba6</code></a> Support pyo3 abi3t features on Python3.15 and PyO3 0.29 (<a href="https://redirect.github.com/pyo3/maturin/issues/3113">#3113</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/8786c59657ab6cefea5d562213018aa8a5e3bbb1"><code>8786c59</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pyo3/maturin/issues/3217">#3217</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/c76018bae4dc8400ef2b2989a4d273df942109bf"><code>c76018b</code></a> GeneratorOutput::additional_files remove Option (<a href="https://redirect.github.com/pyo3/maturin/issues/3215">#3215</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/a49370168e01d284bc57cf626693a569f396d2ee"><code>a493701</code></a> resolve_platform_tags: remove a redundant case (<a href="https://redirect.github.com/pyo3/maturin/issues/3214">#3214</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/7fb56481a7da334a565ea1b53e4ada7c7210d573"><code>7fb5648</code></a> Don't depend on CFFI on PyPy (<a href="https://redirect.github.com/pyo3/maturin/issues/3213">#3213</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/ea07638ab941ed6c70d0545a29a22c65fe3d92b8"><code>ea07638</code></a> Stubs: also generate them for mixed PyO3 projects (<a href="https://redirect.github.com/pyo3/maturin/issues/3211">#3211</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/2770b0de0b600951192044aaa923a29ac2a592c2"><code>2770b0d</code></a> build(deps): bump tar from 0.4.45 to 0.4.46 (<a href="https://redirect.github.com/pyo3/maturin/issues/3209">#3209</a>)</li> <li><a href="https://github.com/PyO3/maturin/commit/96198a8b7b2829efcb5463f07c3fa3eb6ce5530f"><code>96198a8</code></a> [pre-commit.ci] pre-commit autoupdate (<a href="https://redirect.github.com/pyo3/maturin/issues/3210">#3210</a>)</li> <li>Additional commits viewable in <a href="https://github.com/pyo3/maturin/compare/v1.13.3...v1.14.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…he#22973) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.81.8 to 2.81.11. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.81.11</h2> <ul> <li> <p>Update <code>wasm-tools@latest</code> to 1.252.0.</p> </li> <li> <p>Update <code>wasm-bindgen@latest</code> to 0.2.125.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.11.21.</p> </li> <li> <p>Update <code>protoc@latest</code> to 3.35.1.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.6.9.</p> </li> <li> <p>Update <code>jaq@latest</code> to 3.1.0.</p> </li> <li> <p>Update <code>cargo-insta@latest</code> to 1.48.0.</p> </li> <li> <p>Update <code>biome@latest</code> to 2.5.0.</p> </li> </ul> <h2>2.81.10</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 1.1.3.</p> </li> <li> <p>Update <code>release-plz@latest</code> to 0.3.159.</p> </li> <li> <p>Update <code>cosign@latest</code> to 3.1.1.</p> </li> </ul> <h2>2.81.9</h2> <ul> <li> <p>Update <code>wasm-bindgen@latest</code> to 0.2.123.</p> </li> <li> <p>Update <code>tombi@latest</code> to 1.1.2.</p> </li> <li> <p>Update <code>parse-changelog@latest</code> to 0.6.17.</p> </li> <li> <p>Update <code>just@latest</code> to 1.52.0.</p> </li> <li> <p>Update <code>gungraun-runner@latest</code> to 0.19.2.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.20.0.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p> <!-- raw HTML omitted --> <h2>[Unreleased]</h2> <h2>[2.81.11] - 2026-06-15</h2> <ul> <li> <p>Update <code>wasm-tools@latest</code> to 1.252.0.</p> </li> <li> <p>Update <code>wasm-bindgen@latest</code> to 0.2.125.</p> </li> <li> <p>Update <code>uv@latest</code> to 0.11.21.</p> </li> <li> <p>Update <code>protoc@latest</code> to 3.35.1.</p> </li> <li> <p>Update <code>mise@latest</code> to 2026.6.9.</p> </li> <li> <p>Update <code>jaq@latest</code> to 3.1.0.</p> </li> <li> <p>Update <code>cargo-insta@latest</code> to 1.48.0.</p> </li> <li> <p>Update <code>biome@latest</code> to 2.5.0.</p> </li> </ul> <h2>[2.81.10] - 2026-06-11</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 1.1.3.</p> </li> <li> <p>Update <code>release-plz@latest</code> to 0.3.159.</p> </li> <li> <p>Update <code>cosign@latest</code> to 3.1.1.</p> </li> </ul> <h2>[2.81.9] - 2026-06-10</h2> <ul> <li> <p>Update <code>wasm-bindgen@latest</code> to 0.2.123.</p> </li> <li> <p>Update <code>tombi@latest</code> to 1.1.2.</p> </li> <li> <p>Update <code>parse-changelog@latest</code> to 0.6.17.</p> </li> <li> <p>Update <code>just@latest</code> to 1.52.0.</p> </li> <li> <p>Update <code>gungraun-runner@latest</code> to 0.19.2.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/15449e3094499af05d8d964a1c884208e4b8b595"><code>15449e3</code></a> Release 2.81.11</li> <li><a href="https://github.com/taiki-e/install-action/commit/dd3bb882c6c552e1478f55a048fb291cdc1f0ef1"><code>dd3bb88</code></a> Update cargo-deny manifest</li> <li><a href="https://github.com/taiki-e/install-action/commit/89ed2466baa251f4217203b0084736e4ba24365d"><code>89ed246</code></a> Update <code>wasm-tools@latest</code> to 1.252.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/3d2614471d361398dc4a241305a41a22246608cd"><code>3d26144</code></a> Update <code>wasm-bindgen@latest</code> to 0.2.125</li> <li><a href="https://github.com/taiki-e/install-action/commit/02039d84ff2195b312ef1612583b62012fcc7737"><code>02039d8</code></a> Update <code>uv@latest</code> to 0.11.21</li> <li><a href="https://github.com/taiki-e/install-action/commit/f70d988b18b30949b65c310be1e9e1fd34b1bd80"><code>f70d988</code></a> Update trivy manifest</li> <li><a href="https://github.com/taiki-e/install-action/commit/1f7b61373609cdc389de6178ea51d2a069d306ed"><code>1f7b613</code></a> Update <code>protoc@latest</code> to 3.35.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/ace4a195674b1e2219e629001c708779d52f0872"><code>ace4a19</code></a> Update prek manifest</li> <li><a href="https://github.com/taiki-e/install-action/commit/3fbf11909e319a495b93f6c640f257d1979a9cef"><code>3fbf119</code></a> Update <code>mise@latest</code> to 2026.6.9</li> <li><a href="https://github.com/taiki-e/install-action/commit/e62deab512318d610ad0b017fe426bc6cdb75e0f"><code>e62deab</code></a> Update <code>jaq@latest</code> to 3.1.0</li> <li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/0631aa6515c7d545823c67cfae7ef4fc7f490154...15449e3094499af05d8d964a1c884208e4b8b595">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…to >=0.19.0,<1 in /docs (apache#22972) Updates the requirements on [pydata-sphinx-theme](https://github.com/pydata/pydata-sphinx-theme) to permit the latest version. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pydata/pydata-sphinx-theme/releases">pydata-sphinx-theme's releases</a>.</em></p> <blockquote> <h2>v0.19.0</h2> <h2>What's Changed</h2> <ul> <li>Bump pydata/pydata-sphinx-theme/.github/workflows/CI.yml from 178df7a9d69695be4e49fa56822d76b048977387 to 047227ad41f26a993ea7b2182955d26aa837acea by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2398">pydata/pydata-sphinx-theme#2398</a></li> <li>Bump pydata/pydata-sphinx-theme from 178df7a9d69695be4e49fa56822d76b048977387 to 047227ad41f26a993ea7b2182955d26aa837acea by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2397">pydata/pydata-sphinx-theme#2397</a></li> <li>Bump hynek/build-and-inspect-python-package from 2.17.0 to 2.18.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2396">pydata/pydata-sphinx-theme#2396</a></li> <li>Bump pydata/pydata-sphinx-theme/.github/workflows/docs.yml from 178df7a9d69695be4e49fa56822d76b048977387 to 047227ad41f26a993ea7b2182955d26aa837acea by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2395">pydata/pydata-sphinx-theme#2395</a></li> <li>Docs update Font Awesome links by <a href="https://github.com/hntk03"><code>@hntk03</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2399">pydata/pydata-sphinx-theme#2399</a></li> <li>[pre-commit.ci] pre-commit autoupdate hooks by <a href="https://github.com/pre-commit-ci"><code>@pre-commit-ci</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2347">pydata/pydata-sphinx-theme#2347</a></li> <li>[pre-commit.ci] pre-commit autoupdate hooks by <a href="https://github.com/pre-commit-ci"><code>@pre-commit-ci</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2401">pydata/pydata-sphinx-theme#2401</a></li> <li>Fix code coverage workflow by <a href="https://github.com/Yann-P"><code>@Yann-P</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2393">pydata/pydata-sphinx-theme#2393</a></li> <li>Bump actions/checkout from 6.0.2 to 6.0.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2406">pydata/pydata-sphinx-theme#2406</a></li> <li>Bump astral-sh/setup-uv from 8.1.0 to 8.2.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2407">pydata/pydata-sphinx-theme#2407</a></li> <li>Fix vertical scrollbar in math container by <a href="https://github.com/hntk03"><code>@hntk03</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2404">pydata/pydata-sphinx-theme#2404</a></li> <li>[css] De-jumble header items at intermediate widths :) by <a href="https://github.com/sneakers-the-rat"><code>@sneakers-the-rat</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/1784">pydata/pydata-sphinx-theme#1784</a></li> <li>Fix gettext builder by <a href="https://github.com/Yann-P"><code>@Yann-P</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2403">pydata/pydata-sphinx-theme#2403</a></li> <li>Bump 0.18.0 -> 0.19.0rc0 by <a href="https://github.com/Yann-P"><code>@Yann-P</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2410">pydata/pydata-sphinx-theme#2410</a></li> <li>fix: configure defaults for Google Analytics before loading script by <a href="https://github.com/tswast"><code>@tswast</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2369">pydata/pydata-sphinx-theme#2369</a></li> <li>Bump 0.19.0rc0 -> 0.19.0 by <a href="https://github.com/Yann-P"><code>@Yann-P</code></a> in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/2414">pydata/pydata-sphinx-theme#2414</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/sneakers-the-rat"><code>@sneakers-the-rat</code></a> made their first contribution in <a href="https://redirect.github.com/pydata/pydata-sphinx-theme/pull/1784">pydata/pydata-sphinx-theme#1784</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydata/pydata-sphinx-theme/compare/v0.18.0...v0.19.0">https://github.com/pydata/pydata-sphinx-theme/compare/v0.18.0...v0.19.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/01e209b34ce6d5992a1b53c748c15a0cd5ae32ee"><code>01e209b</code></a> Bump 0.19.0rc0 -> 0.19.0 (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2414">#2414</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/1c2612550b75599c39320c553e4d03687072f501"><code>1c26125</code></a> fix: configure defaults for Google Analytics before loading script (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2369">#2369</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/be7da51b4bd63ab941e5ef0c9a93112af9e47be2"><code>be7da51</code></a> Bump 0.18.0 -> 0.19.0rc0 (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2410">#2410</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/0fb30d64bd803f06e4431ae4e2305c558237dd5e"><code>0fb30d6</code></a> Fix gettext builder (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2403">#2403</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/dd460e8af0ea0a13e57c0a7fadea30c5e8a6d724"><code>dd460e8</code></a> [css] De-jumble header items at intermediate widths :) (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/1784">#1784</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/db37b88a001ebc5a1537760a9f731d8a49dd36e4"><code>db37b88</code></a> Fix vertical scrollbar in math container (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2404">#2404</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/2b80896bf6dd0cb68798c7b77fa044da43d2414d"><code>2b80896</code></a> Bump astral-sh/setup-uv from 8.1.0 to 8.2.0 (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2407">#2407</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/bd863f8b0405a6890861888d021e634da0e0d15c"><code>bd863f8</code></a> Bump actions/checkout from 6.0.2 to 6.0.3 (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2406">#2406</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/f67c0362938e55f733eb0ebf756f3970aacdc6d0"><code>f67c036</code></a> Fix code coverage workflow (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2393">#2393</a>)</li> <li><a href="https://github.com/pydata/pydata-sphinx-theme/commit/a5b36001c0e936f85a58a825621da778bc84b343"><code>a5b3600</code></a> [pre-commit.ci] pre-commit autoupdate hooks (<a href="https://redirect.github.com/pydata/pydata-sphinx-theme/issues/2401">#2401</a>)</li> <li>Additional commits viewable in <a href="https://github.com/pydata/pydata-sphinx-theme/compare/v0.18.0...v0.19.0">compare view</a></li> </ul> </details> <br /> <details> <summary>Most Recent Ignore Conditions Applied to This Pull Request</summary> | Dependency Name | Ignore Conditions | | --- | --- | | pydata-sphinx-theme | [>= 0.16.dev0, < 0.17] | </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the proto group with 3 updates in the / directory: [prost](https://github.com/tokio-rs/prost), [prost-build](https://github.com/tokio-rs/prost) and [pbjson-types](https://github.com/influxdata/pbjson). Updates `prost` from 0.14.3 to 0.14.4 <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md">prost's changelog</a>.</em></p> <blockquote> <h1>Prost version 0.14.4</h1> <p><em>PROST!</em> is a <a href="https://protobuf.dev/">Protocol Buffers</a> implementation for the <a href="https://www.rust-lang.org/">Rust Language</a>. <code>prost</code> generates simple, idiomatic Rust code from <code>proto2</code> and <code>proto3</code> files.</p> <h3>🚀 Features</h3> <ul> <li><em>(prost-derive)</em> Make is_valid a constant function (<a href="https://redirect.github.com/tokio-rs/prost/issues/1401">#1401</a>)</li> <li>Increase MSRV to 1.85 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1428">#1428</a>)</li> </ul> <h3>🐛 Bug Fixes</h3> <ul> <li>Use Display instead of Debug for generated enumeration attributes (<a href="https://redirect.github.com/tokio-rs/prost/issues/1419">#1419</a>)</li> <li><em>(prost-derive)</em> Return error for invalid enumeration default identifiers (<a href="https://redirect.github.com/tokio-rs/prost/issues/1426">#1426</a>)</li> <li><em>(build)</em> Grab binary path from cargo (<a href="https://redirect.github.com/tokio-rs/prost/issues/1429">#1429</a>)</li> <li><em>(build)</em> Fix C++ build on GCC 15 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1395">#1395</a>)</li> </ul> <h3>📚 Documentation</h3> <ul> <li>Add example for <code>decode_length_delimiter</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1311">#1311</a>)</li> <li>Update protobuf-src example to avoid unsafe set_var</li> </ul> <h3>🧪 Testing</h3> <ul> <li>Test derive Eq behavior (<a href="https://redirect.github.com/tokio-rs/prost/issues/1422">#1422</a>)</li> <li><em>(groups)</em> Actually construct <code>NestedGroup</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1363">#1363</a>)</li> </ul> <h3>💼 Dependencies</h3> <ul> <li><em>(deps)</em> Update criterion requirement from 0.7 to 0.8 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1374">#1374</a>)</li> <li><em>(deps)</em> Remove <code>getrandom@0.4.1</code> from build-dependencies (<a href="https://redirect.github.com/tokio-rs/prost/issues/1400">#1400</a>)</li> <li><em>(deps)</em> Update rand requirement from 0.9 to 0.10 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1397">#1397</a>)</li> <li><em>(deps)</em> Bump actions/upload-artifact from 6 to 7 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1409">#1409</a>)</li> <li><em>(deps)</em> Update <code>cargo clippy</code> to 1.89 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1433">#1433</a>)</li> <li><em>(deps)</em> Update <code>cargo clippy</code> to 1.91 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1435">#1435</a>)</li> <li><em>(deps)</em> Update and improve nix devshell (<a href="https://redirect.github.com/tokio-rs/prost/issues/1393">#1393</a>)</li> </ul> <h3>🎨 Styling</h3> <ul> <li>Prevent needless borrow (<a href="https://redirect.github.com/tokio-rs/prost/issues/1404">#1404</a>)</li> <li>Use <code>std::hint::black_box()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1403">#1403</a>)</li> <li>Use variables directly in <code>format!()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1432">#1432</a>)</li> <li>Remove explicit <code>.into_iter()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1434">#1434</a>)</li> <li>Run clippy on benches (<a href="https://redirect.github.com/tokio-rs/prost/issues/1405">#1405</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/tokio-rs/prost/commit/13646cde7eab75c81b3047767aa0a86e7dbecf12"><code>13646cd</code></a> chore: Release version 0.14.4 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1437">#1437</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/dad79d5c8e3549d93ebe6f6c723bb42928d805d8"><code>dad79d5</code></a> fix(prost-derive): return error for invalid enumeration default identifiers (...</li> <li><a href="https://github.com/tokio-rs/prost/commit/b0b6c93e3aac89df28690a4967a8bbe93ec95391"><code>b0b6c93</code></a> ci: Update <code>cargo clippy</code> to 1.91 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1435">#1435</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/32cfffbc494f2faf461cab85e04a42412484c0e4"><code>32cfffb</code></a> style: remove explicit <code>.into_iter()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1434">#1434</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/2710efdb9978d9c75fb19b0b092a369a2d385b55"><code>2710efd</code></a> ci: Update <code>cargo clippy</code> to 1.89 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1433">#1433</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/18ea4e42bbc307d33d65e05ad47b3c45623c0500"><code>18ea4e4</code></a> style: use variables directly in <code>format!()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1432">#1432</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/2821bd1d8c20137c83ead4db39f8e1da00b4e854"><code>2821bd1</code></a> build(deps): bump actions/upload-artifact from 6 to 7 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1409">#1409</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/3ce3b39f9206b5e3bbe34c6e1aa69fe3c53f0924"><code>3ce3b39</code></a> test(groups): Actually construct <code>NestedGroup</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1363">#1363</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/8776405574b3ba0a0fe96ada8799ac8bc61ceb3e"><code>8776405</code></a> docs: Update changelog for version 0.14.3 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1431">#1431</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/33d3ef18c008da13e862d7e7674d751ab2776360"><code>33d3ef1</code></a> build: Grab binary path from cargo (<a href="https://redirect.github.com/tokio-rs/prost/issues/1429">#1429</a>)</li> <li>Additional commits viewable in <a href="https://github.com/tokio-rs/prost/compare/v0.14.3...v0.14.4">compare view</a></li> </ul> </details> <br /> Updates `prost-build` from 0.14.3 to 0.14.4 <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md">prost-build's changelog</a>.</em></p> <blockquote> <h1>Prost version 0.14.4</h1> <p><em>PROST!</em> is a <a href="https://protobuf.dev/">Protocol Buffers</a> implementation for the <a href="https://www.rust-lang.org/">Rust Language</a>. <code>prost</code> generates simple, idiomatic Rust code from <code>proto2</code> and <code>proto3</code> files.</p> <h3>🚀 Features</h3> <ul> <li><em>(prost-derive)</em> Make is_valid a constant function (<a href="https://redirect.github.com/tokio-rs/prost/issues/1401">#1401</a>)</li> <li>Increase MSRV to 1.85 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1428">#1428</a>)</li> </ul> <h3>🐛 Bug Fixes</h3> <ul> <li>Use Display instead of Debug for generated enumeration attributes (<a href="https://redirect.github.com/tokio-rs/prost/issues/1419">#1419</a>)</li> <li><em>(prost-derive)</em> Return error for invalid enumeration default identifiers (<a href="https://redirect.github.com/tokio-rs/prost/issues/1426">#1426</a>)</li> <li><em>(build)</em> Grab binary path from cargo (<a href="https://redirect.github.com/tokio-rs/prost/issues/1429">#1429</a>)</li> <li><em>(build)</em> Fix C++ build on GCC 15 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1395">#1395</a>)</li> </ul> <h3>📚 Documentation</h3> <ul> <li>Add example for <code>decode_length_delimiter</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1311">#1311</a>)</li> <li>Update protobuf-src example to avoid unsafe set_var</li> </ul> <h3>🧪 Testing</h3> <ul> <li>Test derive Eq behavior (<a href="https://redirect.github.com/tokio-rs/prost/issues/1422">#1422</a>)</li> <li><em>(groups)</em> Actually construct <code>NestedGroup</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1363">#1363</a>)</li> </ul> <h3>💼 Dependencies</h3> <ul> <li><em>(deps)</em> Update criterion requirement from 0.7 to 0.8 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1374">#1374</a>)</li> <li><em>(deps)</em> Remove <code>getrandom@0.4.1</code> from build-dependencies (<a href="https://redirect.github.com/tokio-rs/prost/issues/1400">#1400</a>)</li> <li><em>(deps)</em> Update rand requirement from 0.9 to 0.10 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1397">#1397</a>)</li> <li><em>(deps)</em> Bump actions/upload-artifact from 6 to 7 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1409">#1409</a>)</li> <li><em>(deps)</em> Update <code>cargo clippy</code> to 1.89 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1433">#1433</a>)</li> <li><em>(deps)</em> Update <code>cargo clippy</code> to 1.91 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1435">#1435</a>)</li> <li><em>(deps)</em> Update and improve nix devshell (<a href="https://redirect.github.com/tokio-rs/prost/issues/1393">#1393</a>)</li> </ul> <h3>🎨 Styling</h3> <ul> <li>Prevent needless borrow (<a href="https://redirect.github.com/tokio-rs/prost/issues/1404">#1404</a>)</li> <li>Use <code>std::hint::black_box()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1403">#1403</a>)</li> <li>Use variables directly in <code>format!()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1432">#1432</a>)</li> <li>Remove explicit <code>.into_iter()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1434">#1434</a>)</li> <li>Run clippy on benches (<a href="https://redirect.github.com/tokio-rs/prost/issues/1405">#1405</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/tokio-rs/prost/commit/13646cde7eab75c81b3047767aa0a86e7dbecf12"><code>13646cd</code></a> chore: Release version 0.14.4 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1437">#1437</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/dad79d5c8e3549d93ebe6f6c723bb42928d805d8"><code>dad79d5</code></a> fix(prost-derive): return error for invalid enumeration default identifiers (...</li> <li><a href="https://github.com/tokio-rs/prost/commit/b0b6c93e3aac89df28690a4967a8bbe93ec95391"><code>b0b6c93</code></a> ci: Update <code>cargo clippy</code> to 1.91 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1435">#1435</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/32cfffbc494f2faf461cab85e04a42412484c0e4"><code>32cfffb</code></a> style: remove explicit <code>.into_iter()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1434">#1434</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/2710efdb9978d9c75fb19b0b092a369a2d385b55"><code>2710efd</code></a> ci: Update <code>cargo clippy</code> to 1.89 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1433">#1433</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/18ea4e42bbc307d33d65e05ad47b3c45623c0500"><code>18ea4e4</code></a> style: use variables directly in <code>format!()</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1432">#1432</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/2821bd1d8c20137c83ead4db39f8e1da00b4e854"><code>2821bd1</code></a> build(deps): bump actions/upload-artifact from 6 to 7 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1409">#1409</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/3ce3b39f9206b5e3bbe34c6e1aa69fe3c53f0924"><code>3ce3b39</code></a> test(groups): Actually construct <code>NestedGroup</code> (<a href="https://redirect.github.com/tokio-rs/prost/issues/1363">#1363</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/8776405574b3ba0a0fe96ada8799ac8bc61ceb3e"><code>8776405</code></a> docs: Update changelog for version 0.14.3 (<a href="https://redirect.github.com/tokio-rs/prost/issues/1431">#1431</a>)</li> <li><a href="https://github.com/tokio-rs/prost/commit/33d3ef18c008da13e862d7e7674d751ab2776360"><code>33d3ef1</code></a> build: Grab binary path from cargo (<a href="https://redirect.github.com/tokio-rs/prost/issues/1429">#1429</a>)</li> <li>Additional commits viewable in <a href="https://github.com/tokio-rs/prost/compare/v0.14.3...v0.14.4">compare view</a></li> </ul> </details> <br /> Updates `pbjson-types` from 0.8.0 to 0.9.0 <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/influxdata/pbjson/commits/0.9.0">compare view</a></li> </ul> </details> <br /> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jefffrey <jeffrey.vo.australia@gmail.com>
…e#22230) ~~## Note: This PR depends on apache#21882 (pluggable SpillFile trait) and cannot be merged before it. Opening in parallel per @alamb's suggestion for easier review. The required SpillFile trait used here is defined in that base PR.To review locally, apply apache#21882 first and then stack this branch on top.~~ **Update:** This PR has been rebased to use the existing `RefCountedTempFile` and is now completely standalone. It can be reviewed and merged independently ## Which issue does this PR close? - Contributes to apache#21215 (and is required by apache#21882) ## Rationale for this change `materializing_stream.rs` and `bitwise_stream.rs` were reading spilled batches via `open_sync_reader` / direct `File::open` calls ~~, bypassing the `SpillFile` abstraction introduced in apache#21882~~. This PR migrates both to use `SpillManager::read_spill_as_stream`. This safely converts the SMJ to an async I/O path, preparing the ground for custom backends (Postgres BufFile, object storage) to handle spill reads without requiring an OS file path. ## What changes are included in this PR? - `materializing_stream.rs`: Eagerly restores spilled `BufferedBatches` via async streams before freezing, avoiding new state machine variants. - `bitwise_stream.rs`: Replaces sync reads with an async `poll_next_unpin` loop, caching the stream to survive `Poll::Pending`. ~~- `spill_file.rs`: Removes `open_sync_reader` from the `SpillFile` trait (no longer needed).~~ ## Are these changes tested? Covered by existing SMJ tests. No new tests added, the behavioral change is internal (sync → async IO path). ## Are there any user-facing changes? No. ~~Removes `open_sync_reader` from the SpillFile trait, this is a breaking API change for anyone implementing the trait, but the trait was introduced in apache#21882 which has not merged yet so there are no external implementors.~~ --------- Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>
When I run `./datafusion/proto-models/regen.sh` from repository root off
main, I'm getting a dirty git state like so:
```sh
datafusion (main)$ ./datafusion/proto-models/regen.sh
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.07s
Running `/Users/jeffrey/.cargo_target_cache/debug/gen`
Copying datafusion/proto-models/src/datafusion.rs to datafusion/proto-models/src/generated/prost.rs
datafusion (main)$ git status
On branch main
Your branch is up to date with 'upstream/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
datafusion/proto-models/proto/proto_descriptor.bin datafusion/proto-models/src/datafusion.rs datafusion/proto-models/src/datafusion.serde.rs datafusion/proto-models/src/datafusion_common.rs
nothing added to commit but untracked files present (use "git add" to track)
```
Copying over the `.gitignore` from `proto` to fix this
https://github.com/apache/datafusion/blob/2282d23d4ff0af91463b63aa99cd793635ecef8e/datafusion/proto/.gitignore#L1-L5
Related PR:
- apache#21929
## Which issue does this PR close? - Closes apache#22955 ## Rationale for this change This fixes a bug with the extract leaf expressions ## What changes are included in this PR? This is a one liner that sanity checks the schema is the same length when we are doing expression pushdown ## Are these changes tested? Yes, a couple of tests have been added. ## Are there any user-facing changes? Nope! --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
## Which issue does this PR close? - part of apache#20746 [EPIC] Complete DML Support (MERGE, INSERT OVERWRITE, TRUNCATE) apache#19617 As well as task 1 of apache#20746 ## Rationale for this change Lay the foundation for MERGE INTO support in DataFusion by adding the logical plan types and their proto serialization. Keeping types separate from execution lets reviewers reason about the data model independently of the planner and physical dispatch. ## What changes are included in this PR? **`datafusion/expr` — new types in `dml.rs`** - `MergeIntoOp` — carries the `ON` join condition and ordered list of `WHEN` clauses - `MergeIntoClause` — a single `WHEN` clause: kind + optional predicate + action - `MergeIntoClauseKind` — `Matched` / `NotMatched` / `NotMatchedByTarget` / `NotMatchedBySource`; includes `is_not_matched_by_target()` and `canonical()` helpers because `NotMatched` and `NotMatchedByTarget` are semantically identical and must be treated identically downstream - `MergeIntoAction` — `Update(Vec<(col, expr)>)` / `Insert { columns, values }` / `Delete` - `WriteOp::MergeInto(MergeIntoOp)` variant added to the existing `WriteOp` enum; `WriteOp` is now `#[non_exhaustive]` so future variant additions are not a SemVer break **`datafusion/proto-models` — proto schema** - Extended `DmlNode` with a `MERGE_INTO` type tag and a boxed `MergeIntoOpNode` payload field - Added `MergeIntoOpNode`, `MergeIntoClauseNode`, `MergeIntoActionNode` messages **`datafusion/proto` — serialization** - `from_proto`: `parse_write_op(&DmlNode, ...)` reads the payload when the type tag is `MergeInto`; defensive helpers `parse_merge_into_op/clause/action` with explicit errors for missing fields - `to_proto`: `serialize_merge_into_op/clause/action` helpers; encode path uses an explicit `match` over all `WriteOp` variants producing `(dml_type, merge_into)` pair — no silent payload loss - Cross-crate conversions use `FromProto` (the crate-local trait) rather than `From` to satisfy the Rust orphan rule after the upstream `datafusion-proto-models` refactor **Proto codegen** — after editing `.proto` files, regenerate with: ```bash PROTOC=/tmp/protoc cargo run --manifest-path datafusion/proto-models/gen/Cargo.toml ``` (Install `protoc` from https://github.com/protocolbuffers/protobuf/releases if not present; set `PROTOC` to its path.) ## Are these changes tested? - `datafusion-expr` unit tests: `WriteOp::MergeInto` display, `is_not_matched_by_target`, `canonical` - `datafusion-proto` round-trip test: exercises all four `MergeIntoClauseKind` variants and all three `MergeIntoAction` variants through encode → decode - `datafusion-proto` error-path tests: missing `merge_into` payload, missing `on` expression, unknown clause kind tag, missing clause action, missing action oneof ## Are there any user-facing changes? `WriteOp` gains a `MergeInto` variant and is now `#[non_exhaustive]`. Existing downstream `match` arms need a wildcard arm added (this is intentional and expected for a new DML operation). ## Follow-up A stacking PR that adds the SQL planner, physical planner dispatch, and `TableProvider::merge_into` hook is available at #2. If reviewers prefer to review both together in one pass, I'm happy to include that work here instead. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…duplicated logic (apache#22907) ## Which issue does this PR close? Follow-up to apache#21794, addressing review feedback from @alamb and @Dandandan. ## Rationale for this change In the review of apache#21794, several optimizations were suggested: - **@alamb** ([comment](apache#21794 (comment))): "as a possible future optimization, we could use `get_unchecked` here if it makes any difference" — referring to the slow-path `arr.views()[row]` access. - **@alamb** ([comment](apache#21794 (comment))): "from here on down I think this is basically the same as append_val_inner -- if there are any differences perhaps we can fold it into append_val_inner and avoid the copy" - **@Dandandan** ([comment](apache#21794 (comment))): "In principle we can make this faster as well - `extend` + reuse input view (instead of make_view) + avoid `array.value(row)`" ## What changes are included in this PR? **1. Refactored `do_append_val_inner` to use raw view access** Replaced `array.value(row)` + `make_view()` with raw view access via `get_unchecked(row)`: - **Inline (len <= 12):** push the u128 view as-is — no decode/re-encode round-trip - **Non-inline (len > 12):** parse via `ByteView::from(view)`, copy buffer data, reuse source prefix directly (avoids re-reading first 4 bytes) **2. Simplified the vectorized slow path** Replaced the duplicated 28-line loop body in `vectorized_append_inner` with `try_reserve` + a loop calling `do_append_val_inner`, eliminating code duplication. **3. Removed unused `make_view` import** ### Safety notes - **`get_unchecked` usage**: Consistent with `do_equal_to_inner` (same file) and `PrimitiveGroupValueBuilder` in `primitive.rs`, both of which use the same pattern. All callers derive row indices from enumeration over the input array length, guaranteeing validity. - **Buffer access safety**: When `data_buffers()` is empty, all views must have len <= 12 (Arrow invariant), so the non-inline branch is never entered. ## Are these changes tested? Covered by 6 existing unit tests in the `bytes_view` module plus 3 integration tests in the `multi_group_by` module. All 111 tests in the aggregates suite pass. ## Are there any user-facing changes? No. This is an internal refactor with no API changes.
…che#22783) ## Which issue does this PR close? Closes apache#22769 ## Rationale for this change `LogicalPlan::Unnest` had an inconsistent API contract: `apply_expressions()` exposed `exec_columns` but `with_new_exprs()` rejected them via `assert_no_expressions`. This broke the standard `node.with_new_exprs(node.expressions(), new_inputs)` pattern. ## What changes are included in this PR? - `with_new_exprs` now accepts expressions from `apply_expressions` (extracts `Column` values back out) - `map_expressions` now properly transforms `exec_columns` instead of treating Unnest as expressionless - Removed stale comment in `extract_leaf_expressions` (semantic barrier remains) ## Are these changes tested? Yes — two new unit tests proving both `with_new_exprs(expressions(), inputs)` and `with_new_exprs(vec![], inputs)` work. All existing optimizer and SLT tests pass. ## Are there any user-facing changes? No. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? - Closes apache#9430. ## Rationale for this change Users frequently want to pipe data into the CLI, e.g. `cat data.csv | datafusion-cli`, but pointing `LOCATION` at `/dev/stdin` did not work: - CSV failed with `Illegal seek` (a pipe is not seekable). - Parquet failed with `file size of 0 is less than footer` (a pipe reports size 0). - JSON silently returned 0 rows. This PR makes reading from standard input work for CSV, JSON, and Parquet. ## What changes are included in this PR? stdin is exposed as a `stdin://` object store, dispatched alongside the other schemes (`s3`, `gs`, `http`, ...) in `get_object_store` — conceptually similar to DuckDB's `PipeFileSystem`. - `rewrite_stdin_location` maps the well-known stdin pseudo-paths (`/dev/stdin`, `/dev/fd/0`, `/proc/self/fd/0`) to a canonical `stdin:///stdin.<ext>` URL, so they flow through the normal object-store/listing code path. The extension matches the declared `STORED AS` format because the listing layer filters candidate files by extension. - The `stdin://` store reads all of standard input into an in-memory object store. Buffering up front is required because a pipe is not seekable and Parquet stores its metadata at the end of the file. Known scope/limitations (left as potential follow-ups): - Only `CREATE EXTERNAL TABLE` is supported (not dynamic `SELECT * FROM '/dev/stdin'`). - Input is fully buffered in memory, so it must fit in memory. - stdin can only be consumed once per session. - Unix-only (`/dev/stdin` does not exist on Windows); writing to `/dev/stdout` is out of scope. ## Are these changes tested? Yes: - Unit tests in `object_storage.rs` cover `rewrite_stdin_location` and end-to-end reads for CSV, JSON, and Parquet via the in-memory store. - A `#[cfg(unix)]` integration test in `cli_integration.rs` drives the real binary through an actual pipe, exercising the real stdin read. - Manually verified all three formats via real pipes, and confirmed normal local-file reads are unaffected. ## Are there any user-facing changes? Yes — reading from stdin via `LOCATION '/dev/stdin'` is now supported. Documented in `docs/source/user-guide/cli/datasources.md` (new "Reading from standard input" section). No breaking changes.
Add merge_into async method to TableProvider trait for MERGE INTO DML support. The method accepts: - source: ExecutionPlan representing the USING clause - on: Expr representing the ON join condition - clauses: Vec<MergeIntoClause> for WHEN MATCHED/NOT MATCHED actions Default implementation returns not_impl_err for tables that don't support MERGE INTO operations.
Implement merge_to_plan and merge_clause_to_plan in SQL planner: - Parse Statement::Merge into LogicalPlan::Dml with WriteOp::MergeInto - Resolve target table and plan source (USING clause) as LogicalPlan - Build combined schema for target + source to resolve ON and WHEN expressions - Convert ON condition and WHEN clauses to DataFusion Expr - Handle UPDATE, INSERT, and DELETE actions in WHEN clauses Add physical planner dispatch for WriteOp::MergeInto: - Use source_as_provider() to recover the TableProvider from the TableSource - Extract source ExecutionPlan from children - Call TableProvider::merge_into with source plan, ON condition, and clauses - Wrap errors with MERGE INTO operation context Wire MergeInto's expressions through LogicalPlan tree-traversal so optimizers can rewrite them: add MergeIntoOp::exprs() (stable iteration order: on, then per-clause predicate + action value Exprs) and MergeIntoOp::with_new_exprs() to rebuild the op from a transformed expr vector. Branch LogicalPlan::apply_expressions, map_expressions, and with_new_exprs on WriteOp::MergeInto to use these helpers; other WriteOp variants continue to expose no expressions as before.
8d7fcff to
cb89e53
Compare
Owner
Author
|
Check the apache/datafusion. Close this personal draft |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Stacked on top of apache#20763 (Add MERGE INTO types to datafusion-expr).
Rationale for this change
Complete the MERGE INTO execution path so that the type definitions added in the parent PR can be planned and dispatched.
What changes are included in this PR?
datafusion/catalog/src/table.rs— extendTableProviderwith amerge_intoasync hook (default returnsnot_impl_err).datafusion/sql/src/statement.rs— SQL planner:merge_to_plan: parseStatement::MergeintoLogicalPlan::Dml(WriteOp::MergeInto(...)). Resolves the target table, plans theUSINGsource, builds a combined schema for resolvingONandWHENexpressions.merge_clause_to_plan: converts eachWHEN MATCHED / NOT MATCHEDclause into aMergeIntoClausewith typedMergeIntoAction.datafusion/expr/src/logical_plan/dml.rs— expression traversal onMergeIntoOp:exprs()— stable iteration (on expr → per-clause predicate → action value exprs).with_new_exprs()— reconstruct op from a transformed expr slice (used by optimizer rewrites).datafusion/expr/src/logical_plan/plan.rs/tree_node.rs— branchapply_expressions,map_expressions,with_new_exprsonWriteOp::MergeIntoto delegate to the helpers above; otherWriteOpvariants are unchanged.datafusion/core/src/physical_planner.rs— physical dispatch forWriteOp::MergeInto:TableProviderviasource_as_provider().ExecutionPlanfrom children.TableProvider::merge_into(source_plan, on_expr, clauses).Are these changes tested?
Unit and integration tests are in the parent PR (apache#20763). End-to-end sqllogictests covering a concrete
TableProvider::merge_intoimplementation are planned as follow-up once a reference implementation exists.Are there any user-facing changes?
TableProvidergains a new methodmerge_intowith a defaultnot_impl_errimplementation — existing implementors are unaffected.