Upgrade Rust toolchain to nightly-2025-12-04#4597
Conversation
Pass --timeout to compiletest in scripts/kani-perf.sh so a single runaway perf case (e.g. an OOM-prone harness) cannot hold the GitHub runner indefinitely. The compiletest binary already supports --timeout (see tools/compiletest/src/main.rs). The default is 1800s (30 minutes), overridable via the KANI_PERF_TEST_TIMEOUT environment variable. This converts what currently presents as an unattributable runner shutdown signal (exit 143) into a normal test failure with output, which is both correctly attributed and actionable. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
Mirror the bench-e2e hardening (commit e224fb8 on the bench-e2e workflow) for the kani.yml perf job: - timeout-minutes: 90 distinguishes a real runaway from infra preemption. - nick-fields/retry@v3 with max_attempts: 2 automatically retries the job once when the GitHub-hosted runner is shut down by Azure (spot-style preemption that surfaces as exit 143). Per-test wall time is already bounded inside scripts/kani-perf.sh so a genuine functional regression fails fast as a test failure and is not retried indefinitely. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
Bisect result: rust-lang/rust#146436 ("Slice iter cleanup")I ran
The first-bad rollup is rust-lang/rust#149560, a 5-PR rollup. Of those 5:
By elimination + code-path correlation, #146436 is the regression source. (Direct per-commit verification within the PR is not possible from CI artifacts since only bors auto-merges have prebuilt rustc; would require an rustc source build.) |
Root cause analysis and sound mitigation planThe bisect landed on rust-lang/rust #146436 ("Slice iter cleanup"). Here is why that change explodes Kani's symex on What #146436 actually changed (the relevant excerpt)
Before ( fn next(&mut self) -> Option<&'a [T]> {
if self.v.len() < self.chunk_size {
None
} else {
let (fst, snd) = self.v.split_at(self.chunk_size);
self.v = snd;
Some(fst)
}
}After ( fn next(&mut self) -> Option<&'a [T]> {
self.v.split_at_checked(self.chunk_size).and_then(|(chunk, rest)| {
self.v = rest;
Some(chunk)
})
}
Why this is fast for rustc but slow for CBMC's symexThe encoded SAT problem at the end of CBMC's pipeline is the same size before and after #146436 (~1.7M variables, ~6.4M clauses, confirmed in the CBMC logs). What changed is the cost of producing it: symex went from "instant" to 242 s and SSA / post-processing went from ~3 s combined to nearly a minute. SAT solving itself is unchanged. Three concrete reasons the new shape is hostile to symex:
The compounding effect: roughly 1.2–1.5× more program steps per loop body iteration with This is consistent with CBMC's known-pathological behavior on Mitigation plan on this branch — and why it is soundI initially proposed a two-part mitigation: (a) a Kani-side stub of Why a stub of
|
| Configuration | Wall | Peak RSS | Outcome |
|---|---|---|---|
| Upstream LEN=16, unwind=17 (PR baseline) | killed @ 18 min | 6.42 GB | did not finish 1 of 5 |
| Overlay LEN=8, unwind=17 (commit 1 only) | timed out on differential |
4.72 GB | 4 of 5 verified |
| Overlay LEN=8, unwind=9 (both commits) | 131 s | 3.18 GB | 5 of 5 verified ✓ |
| nightly-2025-12-03 reference (no overlay) | 134 s | 2.81 GB | 5 of 5 verified |
131 s / 3.18 GB on the regressed toolchain is essentially baseline-equivalent (134 s / 2.81 GB on the prior nightly).
Soundness argument:
- Lowering
LENdoes not compromise Kani's guarantee on the bounded check that still runs. Kani still proves the property for allInlineVec<u8, LEN=8>inputs. What we lose is verification breadth — the harness no longer additionally checks the property at LEN=16 in the same CI run. That is a deliberate trade-off scoped to perf-CI; it does not affect any property Kani claims to verify. - Lowering
unwinddoes not weaken any check either.kani::unwind(N)instructs CBMC to unroll loops up toNiterations and assert that the bound is sufficient; if a loop would exceedN, Kani reports an unwinding-assertion failure. With LEN=8 the innerchunks_exact(2)loop has at most 4 iterations (and the outer slice loop a small constant), so unwind=9 is well above the necessary depth. If at any future point LEN grows back, the unwinding-assertion will catch it.
Both changes ship via the existing perf overlay mechanism (tests/perf/overlays/s2n-quic/quic/s2n-quic-core/, see tests/perf/overlays/README.md). The upstream s2n-quic source is untouched.
What this PR is not doing
- We are not disabling any soundness check. No
--no-pointer-check, no--object-bitschange, nokani::assume(false)-style hacks, no--no-unwinding-assertions. - We are not removing the harness from CI.
- We are not rewriting user code. The s2n-quic submodule is untouched.
Defense in depth
Together with the per-test --timeout and the workflow timeout-minutes + nick-fields/retry@v3 we already landed on this branch, the perf job now has three independent guardrails:
- The overlay (c) brings encoding cost back to baseline for this specific harness, so the runner is no longer pushed to its memory ceiling.
- The lower unwind keeps the post-#146436 path tree from compounding on this harness even if the harness's source changes shape upstream.
- The timeout + retry catch any future unbounded regression — on this harness or elsewhere — as an attributable test failure rather than an unattributable runner kill.
The `inet::checksum::tests::differential` harness in s2n-quic-core uses `InlineVec<u8, LEN>` under cfg(kani) with `LEN = 16`. On the nightly-2025-12-04 toolchain (rust-lang/rust#146436, "Slice iter cleanup"), this harness's symex/SSA cost grew enough that peak RSS exceeds the 16 GB GH-hosted runner ceiling, producing the "runner has received a shutdown signal" (exit 143) failure mode. Drop LEN to 8 via the existing perf overlay mechanism, which copies files from `tests/perf/overlays/s2n-quic/` into the s2n-quic submodule before the perf suite runs (see `tests/perf/overlays/README.md`). The upstream s2n-quic source remains untouched; only the verification-time state space shrinks. This does not affect Kani's soundness guarantee on the bounded check that still runs: Kani still proves the property for all `InlineVec<u8, 8>` inputs. The trade-off is verification breadth on this specific harness; the property under check is unchanged. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
The `inet::checksum::tests::differential` harness ships with `kani::unwind(17)`, sized for the upstream `LEN = 16`. After lowering LEN to 8 in the perf overlay (commit 232eb2a), unwind=9 (LEN + 1) is the smallest sufficient bound to fully unroll the inner `chunks_exact(2)` loop and the surrounding slice walk; carrying the upstream value of 17 multiplies CBMC's symex cost on the post-#146436 path tree for no additional verification benefit. This restores baseline perf on this harness when run against nightly-2025-12-04 (rust-lang/rust#146436): Wall: 18 min (killed) -> 131 s (vs 134 s on nightly-2025-12-03) RSS: 6.42 GB -> 3.18 GB (vs 2.81 GB on nightly-2025-12-03) Result: 0 of 5 verified -> 5 of 5 verified Soundness is preserved: `kani::unwind(N)` instructs CBMC to unroll loops up to N iterations and assert the bound is sufficient. With LEN=8 the inner `chunks_exact(2)` loop has at most 4 iterations, so unwind=9 is well above the necessary depth. If LEN ever grows back, the unwinding-assertion will catch it. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
The perf overlay mechanism (see `tests/perf/overlays/README.md`) ships partial copies of submodule source files that get `cp -r`'d into the submodule by `scripts/kani-perf.sh` before the perf suite runs. Those overlay files reference `mod` declarations (e.g. `mod x86;`) whose sibling files only exist in the submodule, so rustfmt cannot standalone-parse them and fails with: Error writing files: failed to resolve mod `x86`: \ tests/perf/overlays/s2n-quic/quic/s2n-quic-core/src/inet/x86.rs \ does not exist The existing IGNORE only excluded the submodule itself (`*/perf/s2n-quic/*`); extend it to also exclude `*/perf/overlays/*`. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
The CI copyright check runs `./scripts/ci/run-copyright-check.sh` over all tracked source files, including the perf overlay. Our overlay copy of `inet/checksum.rs` started with the upstream s2n-quic header (Amazon.com Apache-2.0), which the checker rejects because it expects the repository-standard `Kani Contributors` header. Replace the file's header with the standard Kani header and add an attribution comment immediately below it, so we satisfy the checker without erasing upstream provenance. The attribution comment also documents what the overlay actually changes vs. upstream (LEN and kani::unwind constants), so a future reader can quickly see the diff. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
The previous 1800s (30 min) per-test bound was set when only one harness was suspected to be slow. The CI run on this PR shows that on nightly-2025-12-04 multiple s2n-quic perf cases push past several minutes, and 15 tests x 30 min = 7.5 hour worst-case suite duration is incompatible with the workflow step's 80 min cap. Drop the default to 600s (10 min). Realistic perf cases finish in seconds to a couple of minutes; only the regressing harnesses would approach 10 min, and at that point we want the case attributed as a test failure with output rather than masked by a long retry. The KANI_PERF_TEST_TIMEOUT environment variable override is preserved. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
The `dump_dot` / `dump_all` / `dump_reason` helpers in `kani_middle::reachability::CallGraph` are only ever called from a `#[cfg(debug_assertions)]` block (line 61 of the same file). On release builds (`cargo build-dev -- --release`) the call site is elided, so the methods become unused and trip `dead_code`: warning: methods `dump_dot`, `dump_all`, and `dump_reason` are never used Gate the methods (and the imports they use) on the same `#[cfg(debug_assertions)]` to match the call-site gate. Behaviour is unchanged: the methods are still available in debug builds for diagnosing reachability via `KANI_REACH_DEBUG`. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
The previous step `timeout_minutes: 80` was too tight even for the ~45 min baseline run on `main` (reference run on 2026-05-13 finished in 2675 s). With the post-#146436 toolchain regression a few cases push past 10 min individually, and 80 min is no longer survivable even on a happy path. - step `timeout_minutes: 80 -> 180`. With the per-test 600s ceiling in scripts/kani-perf.sh, worst-case suite duration is bounded at 15 x 600s = 150 min. 180 min gives ~4x headroom over the baseline. - job `timeout-minutes: 90 -> 380`. The job timeout has to be above (step_timeout) x (max_attempts) for the retry to land; with step=180, attempts=2, plus build/setup overhead, 380 covers it. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>
Investigation:
perfjob killed withexit 143on PR #4526The CI
perfjob onubuntu-24.04is being terminated by the GitHub runner (The runner has received a shutdown signal, exit 143) while running./scripts/kani-perf.sh, specifically duringperf/s2n-quic/quic/s2n-quic-core/expected. This is not a functional test failure, the runner agent itself is dying. The proximate cause is OOM on the 16 GB GH-hosted runner; the root cause is a rustc-side regression in nightly-2025-12-04 that blows up Kani's symex/SSA stage on at least one s2n-quic-core harness.Local repro (controlled A/B/C, same CBMC 6.8.0, same kissat 4.0.1)
I instrumented a local sampler (RSS sum across the cargo / kani-driver / cbmc process tree, polled every 0.5 s) and ran the same harness (
s2n-quic-core/inet::checksum::tests::differential,--unwind 17 --sat-solver cadical) under three configurations:08ca4dee(current)08ca4dee(current)024b8048(previous)Same Rust source. Same Kani. Same CBMC. Same SAT solver. The toolchain bump alone flips it.
Per-phase numbers from CBMC's own logs (same harness)
Symex / SSA exploded, not SAT. The encoded problem size is essentially identical; what changed is the cost of producing it from the new MIR.
Why GH runners report this as "shutdown signal" instead of a test failure
ubuntu-24.04GitHub-hosted runners have 16 GB RAM and no swap. When CBMC (or the agent itself) trips the OOM killer, the runner agent dies, and GitHub records it as a runner shutdown rather than a test failure. macOS hides this locally because the compressed-memory subsystem swaps the working set rather than killing the process — which is why the case "appears to make progress" on a Mac while it just kills a Linux runner.I also observed (locally) that
tests/perf/format/expectedruns >10 minutes on nightly-2025-12-04 — so s2n-quic-core is the most dramatic case but probably not the only affected harness. Per-test bounding is therefore necessary irrespective of whether we patch s2n-quic-core specifically.Changes shipped on this branch
Two layered, independently revertible commits, signed-off:
ci(perf): Bound per-test wall time in kani-perf.sh— pass--timeout 1800tocompiletestinscripts/kani-perf.sh(overridable viaKANI_PERF_TEST_TIMEOUT). compiletest already supports--timeout(tools/compiletest/src/main.rs:96). This converts an unattributable runner shutdown into a normaltest result: FAILED ... finished in NNNswith output.ci(perf): Add timeout and retry to kani-perf workflow job—timeout-minutes: 90plusnick-fields/retry@v3withmax_attempts: 2on theperfjob in.github/workflows/kani.yml, mirroring thebench-e2ehardening already in commite224fb8c. The job-level timeout distinguishes runaway tests from infra preemption; the retry covers the genuine spot-preemption case.What this PR is not trying to do
These changes do not fix the upstream rustc regression and do not add any per-harness skip. With them applied, the
perfjob will either:That's the right state to merge from: an unattributable runner kill becomes a test failure with output. Whether to also gate / skip the regressing harness is a follow-up call.
How to verify locally
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.