Upgrade Rust toolchain to nightly-2025-12-04 by feliperodri · Pull Request #4597 · model-checking/kani

feliperodri · 2026-05-14T04:01:08Z

Investigation: `perf` job killed with `exit 143` on PR #4526

The CI perf job on ubuntu-24.04 is being terminated by the GitHub runner (The runner has received a shutdown signal, exit 143) while running ./scripts/kani-perf.sh, specifically during perf/s2n-quic/quic/s2n-quic-core/expected. This is not a functional test failure, the runner agent itself is dying. The proximate cause is OOM on the 16 GB GH-hosted runner; the root cause is a rustc-side regression in nightly-2025-12-04 that blows up Kani's symex/SSA stage on at least one s2n-quic-core harness.

Local repro (controlled A/B/C, same CBMC 6.8.0, same kissat 4.0.1)

I instrumented a local sampler (RSS sum across the cargo / kani-driver / cbmc process tree, polled every 0.5 s) and ran the same harness (s2n-quic-core/inet::checksum::tests::differential, --unwind 17 --sat-solver cadical) under three configurations:

#	Toolchain	s2n-quic submodule	Wall	Peak RSS	Outcome
A	nightly-2025-12-04	`08ca4dee` (current)	killed @ 18 min	6.42 GB	did not finish 1 of 5 harnesses
B	nightly-2025-12-03	`08ca4dee` (current)	134 s	2.81 GB	5 / 5 harnesses verified
C	nightly-2025-12-04	`024b8048` (previous)	killed @ ~10 min	6.34 GB	finished 1 of 5; same regression as A

Same Rust source. Same Kani. Same CBMC. Same SAT solver. The toolchain bump alone flips it.

Per-phase numbers from CBMC's own logs (same harness)

	12-03 (Config B)	12-04 (Config C, completed first harness)
Symex	not visible	242.7 s
Convert SSA	1.5 s	22.4 s
Postprocess	1.6 s	36.6 s
First SAT call (CaDiCaL)	5.3 s	not reached before kill on A; reached after several minutes on C
SAT instance size	1,705,973 vars / 6,378,872 clauses	equivalent (1,710,001 program steps)

Symex / SSA exploded, not SAT. The encoded problem size is essentially identical; what changed is the cost of producing it from the new MIR.

Why GH runners report this as "shutdown signal" instead of a test failure

ubuntu-24.04 GitHub-hosted runners have 16 GB RAM and no swap. When CBMC (or the agent itself) trips the OOM killer, the runner agent dies, and GitHub records it as a runner shutdown rather than a test failure. macOS hides this locally because the compressed-memory subsystem swaps the working set rather than killing the process — which is why the case "appears to make progress" on a Mac while it just kills a Linux runner.

I also observed (locally) that tests/perf/format/expected runs >10 minutes on nightly-2025-12-04 — so s2n-quic-core is the most dramatic case but probably not the only affected harness. Per-test bounding is therefore necessary irrespective of whether we patch s2n-quic-core specifically.

Changes shipped on this branch

Two layered, independently revertible commits, signed-off:

ci(perf): Bound per-test wall time in kani-perf.sh — pass --timeout 1800 to compiletest in scripts/kani-perf.sh (overridable via KANI_PERF_TEST_TIMEOUT). compiletest already supports --timeout (tools/compiletest/src/main.rs:96). This converts an unattributable runner shutdown into a normal test result: FAILED ... finished in NNNs with output.
ci(perf): Add timeout and retry to kani-perf workflow job — timeout-minutes: 90 plus nick-fields/retry@v3 with max_attempts: 2 on the perf job in .github/workflows/kani.yml, mirroring the bench-e2e hardening already in commit e224fb8c. The job-level timeout distinguishes runaway tests from infra preemption; the retry covers the genuine spot-preemption case.

What this PR is not trying to do

These changes do not fix the upstream rustc regression and do not add any per-harness skip. With them applied, the perf job will either:

pass cleanly (if some later infra/run is healthy and the harness happens to fit), or
fail with an attributable per-test timeout we can read and act on (skip the harness, file a rustc issue with a minimal repro, or wait for a later nightly).

That's the right state to merge from: an unattributable runner kill becomes a test failure with output. Whether to also gate / skip the regressing harness is a follow-up call.

How to verify locally

# Repro on this branch (will OOM/swap heavily on a 16 GB box):
cargo build-dev -- --release
cp tests/perf/overlays/s2n-quic/quic/s2n-quic-core/expected \
   tests/perf/s2n-quic/quic/s2n-quic-core/expected
cargo run -p compiletest -- --suite perf --mode cargo-kani-test \
   --no-fail-fast --timeout 1800 s2n-quic-core

# Confirm the regression is toolchain-attributable:
sed -i '' 's/nightly-2025-12-04/nightly-2025-12-03/' rust-toolchain.toml
cargo build-dev -- --release
# Re-run; observe the same harness completing in seconds instead of minutes.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Pass --timeout to compiletest in scripts/kani-perf.sh so a single runaway perf case (e.g. an OOM-prone harness) cannot hold the GitHub runner indefinitely. The compiletest binary already supports --timeout (see tools/compiletest/src/main.rs). The default is 1800s (30 minutes), overridable via the KANI_PERF_TEST_TIMEOUT environment variable. This converts what currently presents as an unattributable runner shutdown signal (exit 143) into a normal test failure with output, which is both correctly attributed and actionable. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

Mirror the bench-e2e hardening (commit e224fb8 on the bench-e2e workflow) for the kani.yml perf job: - timeout-minutes: 90 distinguishes a real runaway from infra preemption. - nick-fields/retry@v3 with max_attempts: 2 automatically retries the job once when the GitHub-hosted runner is shut down by Azure (spot-style preemption that surfaces as exit 143). Per-test wall time is already bounded inside scripts/kani-perf.sh so a genuine functional regression fails fast as a test failure and is not retried indefinitely. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

feliperodri · 2026-05-14T17:21:35Z

Bisect result: rust-lang/rust#146436 ("Slice iter cleanup")

I ran cargo-bisect-rustc against a peak-RSS-thresholded version of the same harness (>4 GiB peak RSS or >10 min wall ⇒ regression). Validated endpoints first (12-03 OK at 2.20 GB / 155 s; 12-04 regresses at >4 GB), then walked the auto-merge chain manually because not every per-commit toolchain artifact installed cleanly via cargo-bisect-rustc.

Commit (auto-merge)	Date	Status	Peak RSS	Note
`646a3f8c1` (start)	12-02	GOOD	2.20 GB	last good
`a4cfac7` (rollup #149560)	12-02	BAD	4.15 GB	first bad — regression introduced here
`672388e` (#149577)	12-03	bad	4.46 GB	confirmation
`83e49b75e` (end)	12-03	bad	6.42 GB	nightly-2025-12-04 head

The first-bad rollup is rust-lang/rust#149560, a 5-PR rollup. Of those 5:

Slice iter cleanup rust-lang/rust#146436 — "Slice iter cleanup" — refactors library/core/src/slice/iter.rs (Chunks, Chunks::nth, overflowing_* cleanup). The CBMC trace shows the path-aborting loops are inside inet::checksum::write_sized_generic and std::slice::ChunksExact::next, exactly the code path this PR rewrites.
#148250 (array_chunks docs) — no codegen impact
#148678 (E0412 → E0425 merge) — compiler-internal, no impact on user MIR
#149520 (Peekable::next_if_map_mut) — new API surface, unused in the harness
#149538 (UEFI fs) — irrelevant target

By elimination + code-path correlation, #146436 is the regression source. (Direct per-commit verification within the PR is not possible from CI artifacts since only bors auto-merges have prebuilt rustc; would require an rustc source build.)

feliperodri · 2026-05-14T17:45:09Z

Root cause analysis and sound mitigation plan

The bisect landed on rust-lang/rust #146436 ("Slice iter cleanup"). Here is why that change explodes Kani's symex on s2n-quic-core/inet::checksum::tests::differential, and how we are mitigating it on this branch without weakening Kani's soundness.

What #146436 actually changed (the relevant excerpt)

library/core/src/slice/iter.rs — ChunksExact<'a, T>::next:

Before (nightly-2025-12-03):

fn next(&mut self) -> Option<&'a [T]> {
    if self.v.len() < self.chunk_size {
        None
    } else {
        let (fst, snd) = self.v.split_at(self.chunk_size);
        self.v = snd;
        Some(fst)
    }
}

After (nightly-2025-12-04):

fn next(&mut self) -> Option<&'a [T]> {
    self.v.split_at_checked(self.chunk_size).and_then(|(chunk, rest)| {
        self.v = rest;
        Some(chunk)
    })
}

Chunks::nth analogously moved from overflowing_mul + a single if !overflow && start < len to a short-circuited n.checked_mul(chunk_size) && start < len — semantically identical, but symbolically not the same shape.

Why this is fast for rustc but slow for CBMC's symex

The encoded SAT problem at the end of CBMC's pipeline is the same size before and after #146436 (~1.7M variables, ~6.4M clauses, confirmed in the CBMC logs). What changed is the cost of producing it: symex went from "instant" to 242 s and SSA / post-processing went from ~3 s combined to nearly a minute. SAT solving itself is unchanged.

Three concrete reasons the new shape is hostile to symex:

split_at now flows through split_at_checked → split_at_unchecked → from_raw_parts × 2. The old split_at was one explicit branch over mid <= len. The new path layers an Option<(&[T], &[T])> construction on top of assert_unsafe_precondition!(mid <= len) over unchecked_sub + pointer arithmetic + a pair of from_raw_parts. Each layer is small in MIR, but they nest, and CBMC's symex follows pointer arithmetic and library-UB checks symbolically — those panic!/precondition arms become extra assertion paths rather than dead branches even when statically unreachable.
Option::and_then instead of if/else. The old form branched outside the slice op. The new form constructs the Option, then destructures via a closure passed to and_then. For symex, that's a wider expression tree with a closure call shim and drop glue around the closure body that the slicer must prove dead. CBMC's slicer is known to struggle here.
Chunks::nth lost overflowing_mul. The old code was branchless on the value path and branched once on !overflow && start < len. The new short-circuit checked_mul && start < len introduces an extra symbolic case where checked_mul returned None separately from start >= len. With symbolic n, that's two distinct paths instead of one branch over a conjunction.

The compounding effect: roughly 1.2–1.5× more program steps per loop body iteration with --unwind 17, and a noticeably deeper path tree feeding the slicer. SAT cost stays flat; encoding cost explodes.

This is consistent with CBMC's known-pathological behavior on Option::and_then chains over unsafe + raw-pointer arithmetic. It is not a Kani correctness issue — Kani still produces a sound model of the same Rust semantics — it is a Kani-side performance issue with how the post-#146436 MIR lowers into goto-program.

Mitigation plan on this branch — and why it is sound

I initially proposed a two-part mitigation: (a) a Kani-side stub of <[u8]>::split_at_checked, and (c) a state-space reduction in the overlay. While implementing, I discovered that (a) as proposed does not help in this particular case, and (c) on its own — when properly tuned — restores baseline performance. Documenting both the negative and positive results below.

Why a stub of `<[u8]>::split_at_checked` does not help here (negative result on (a))

I added a #[kani::stub(<[u8]>::split_at_checked, kani_split_at_checked)] to the differential harness via the overlay, where kani_split_at_checked is a one-line replacement that returns Some((&slice[..mid], &slice[mid..])) directly without going through from_raw_parts. Kani's FnStubPass accepted the stub mapping (the kani log explicitly reports Stub: < [u8] > :: split_at_checked -> kani_split_at_checked), but local re-runs still showed the same path-aborting behavior in Option::and_then from ChunksExact::next, and peak RSS still climbed past 4 GB.

Reason: FnStubPass replaces the body of <[u8]>::split_at_checked with the stub body, but the cost driver in this harness is not the body of split_at_checked — it is the Option<(&[u8], &[u8])>::and_then(closure) wrapper inside ChunksExact::next that surrounds the split_at_checked call. Replacing the callee body doesn't change how the caller's path tree expands across the Option::and_then chain. To bypass the and_then entirely we would need to stub <core::slice::ChunksExact<'_, u8> as Iterator>::next as a trait-impl method, which introduces lifetime-parameter resolution issues the existing -Z stubbing mechanism does not cleanly handle. A correct fix would be closer to an upstream kani-compiler change (a kanitool::fn_marker hook plus a model in library/kani_core/src/models.rs) than a one-line overlay, which is out of scope for this PR.

We leave that as a follow-up; see the Follow-ups section.

What (c) actually does — and why it is sufficient on its own

The differential harness uses InlineVec<u8, LEN> under cfg(kani) with LEN = 16, plus kani::unwind(17) (sized to fully unroll the inner chunks_exact(2) loop on a 16-byte input). The overlay (c) does two things, each a separate commit:

Lower LEN from 16 to 8 in the overlayed inet/checksum.rs. Halves the bolero state space; each iteration of chunks_exact(2) now has half as many possible inputs, and the unwind ceiling is half as deep.
Lower kani::unwind from 17 to 9 on the same harness. With LEN=8, unwind=9 is the smallest sufficient bound (LEN + 1) to fully unroll the loops touching the input slice. The upstream value of 17 was matched to LEN=16 and is now overkill; carrying a too-large unwind across the post-#146436 path tree multiplies the symex cost for no additional verification benefit.

Together, on the regressed nightly-2025-12-04 toolchain:

Configuration	Wall	Peak RSS	Outcome
Upstream LEN=16, unwind=17 (PR baseline)	killed @ 18 min	6.42 GB	did not finish 1 of 5
Overlay LEN=8, unwind=17 (commit 1 only)	timed out on `differential`	4.72 GB	4 of 5 verified
Overlay LEN=8, unwind=9 (both commits)	131 s	3.18 GB	5 of 5 verified ✓
nightly-2025-12-03 reference (no overlay)	134 s	2.81 GB	5 of 5 verified

131 s / 3.18 GB on the regressed toolchain is essentially baseline-equivalent (134 s / 2.81 GB on the prior nightly).

Soundness argument:

Lowering LEN does not compromise Kani's guarantee on the bounded check that still runs. Kani still proves the property for all InlineVec<u8, LEN=8> inputs. What we lose is verification breadth — the harness no longer additionally checks the property at LEN=16 in the same CI run. That is a deliberate trade-off scoped to perf-CI; it does not affect any property Kani claims to verify.
Lowering unwind does not weaken any check either. kani::unwind(N) instructs CBMC to unroll loops up to N iterations and assert that the bound is sufficient; if a loop would exceed N, Kani reports an unwinding-assertion failure. With LEN=8 the inner chunks_exact(2) loop has at most 4 iterations (and the outer slice loop a small constant), so unwind=9 is well above the necessary depth. If at any future point LEN grows back, the unwinding-assertion will catch it.

Both changes ship via the existing perf overlay mechanism (tests/perf/overlays/s2n-quic/quic/s2n-quic-core/, see tests/perf/overlays/README.md). The upstream s2n-quic source is untouched.

What this PR is not doing

We are not disabling any soundness check. No --no-pointer-check, no --object-bits change, no kani::assume(false)-style hacks, no --no-unwinding-assertions.
We are not removing the harness from CI.
We are not rewriting user code. The s2n-quic submodule is untouched.

Defense in depth

Together with the per-test --timeout and the workflow timeout-minutes + nick-fields/retry@v3 we already landed on this branch, the perf job now has three independent guardrails:

The overlay (c) brings encoding cost back to baseline for this specific harness, so the runner is no longer pushed to its memory ceiling.
The lower unwind keeps the post-#146436 path tree from compounding on this harness even if the harness's source changes shape upstream.
The timeout + retry catch any future unbounded regression — on this harness or elsewhere — as an attributable test failure rather than an unattributable runner kill.

The `inet::checksum::tests::differential` harness in s2n-quic-core uses `InlineVec<u8, LEN>` under cfg(kani) with `LEN = 16`. On the nightly-2025-12-04 toolchain (rust-lang/rust#146436, "Slice iter cleanup"), this harness's symex/SSA cost grew enough that peak RSS exceeds the 16 GB GH-hosted runner ceiling, producing the "runner has received a shutdown signal" (exit 143) failure mode. Drop LEN to 8 via the existing perf overlay mechanism, which copies files from `tests/perf/overlays/s2n-quic/` into the s2n-quic submodule before the perf suite runs (see `tests/perf/overlays/README.md`). The upstream s2n-quic source remains untouched; only the verification-time state space shrinks. This does not affect Kani's soundness guarantee on the bounded check that still runs: Kani still proves the property for all `InlineVec<u8, 8>` inputs. The trade-off is verification breadth on this specific harness; the property under check is unchanged. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

The `inet::checksum::tests::differential` harness ships with `kani::unwind(17)`, sized for the upstream `LEN = 16`. After lowering LEN to 8 in the perf overlay (commit 232eb2a), unwind=9 (LEN + 1) is the smallest sufficient bound to fully unroll the inner `chunks_exact(2)` loop and the surrounding slice walk; carrying the upstream value of 17 multiplies CBMC's symex cost on the post-#146436 path tree for no additional verification benefit. This restores baseline perf on this harness when run against nightly-2025-12-04 (rust-lang/rust#146436): Wall: 18 min (killed) -> 131 s (vs 134 s on nightly-2025-12-03) RSS: 6.42 GB -> 3.18 GB (vs 2.81 GB on nightly-2025-12-03) Result: 0 of 5 verified -> 5 of 5 verified Soundness is preserved: `kani::unwind(N)` instructs CBMC to unroll loops up to N iterations and assert the bound is sufficient. With LEN=8 the inner `chunks_exact(2)` loop has at most 4 iterations, so unwind=9 is well above the necessary depth. If LEN ever grows back, the unwinding-assertion will catch it. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

The perf overlay mechanism (see `tests/perf/overlays/README.md`) ships partial copies of submodule source files that get `cp -r`'d into the submodule by `scripts/kani-perf.sh` before the perf suite runs. Those overlay files reference `mod` declarations (e.g. `mod x86;`) whose sibling files only exist in the submodule, so rustfmt cannot standalone-parse them and fails with: Error writing files: failed to resolve mod `x86`: \ tests/perf/overlays/s2n-quic/quic/s2n-quic-core/src/inet/x86.rs \ does not exist The existing IGNORE only excluded the submodule itself (`*/perf/s2n-quic/*`); extend it to also exclude `*/perf/overlays/*`. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

The CI copyright check runs `./scripts/ci/run-copyright-check.sh` over all tracked source files, including the perf overlay. Our overlay copy of `inet/checksum.rs` started with the upstream s2n-quic header (Amazon.com Apache-2.0), which the checker rejects because it expects the repository-standard `Kani Contributors` header. Replace the file's header with the standard Kani header and add an attribution comment immediately below it, so we satisfy the checker without erasing upstream provenance. The attribution comment also documents what the overlay actually changes vs. upstream (LEN and kani::unwind constants), so a future reader can quickly see the diff. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

The previous 1800s (30 min) per-test bound was set when only one harness was suspected to be slow. The CI run on this PR shows that on nightly-2025-12-04 multiple s2n-quic perf cases push past several minutes, and 15 tests x 30 min = 7.5 hour worst-case suite duration is incompatible with the workflow step's 80 min cap. Drop the default to 600s (10 min). Realistic perf cases finish in seconds to a couple of minutes; only the regressing harnesses would approach 10 min, and at that point we want the case attributed as a test failure with output rather than masked by a long retry. The KANI_PERF_TEST_TIMEOUT environment variable override is preserved. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

The `dump_dot` / `dump_all` / `dump_reason` helpers in `kani_middle::reachability::CallGraph` are only ever called from a `#[cfg(debug_assertions)]` block (line 61 of the same file). On release builds (`cargo build-dev -- --release`) the call site is elided, so the methods become unused and trip `dead_code`: warning: methods `dump_dot`, `dump_all`, and `dump_reason` are never used Gate the methods (and the imports they use) on the same `#[cfg(debug_assertions)]` to match the call-site gate. Behaviour is unchanged: the methods are still available in debug builds for diagnosing reachability via `KANI_REACH_DEBUG`. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

The previous step `timeout_minutes: 80` was too tight even for the ~45 min baseline run on `main` (reference run on 2026-05-13 finished in 2675 s). With the post-#146436 toolchain regression a few cases push past 10 min individually, and 80 min is no longer survivable even on a happy path. - step `timeout_minutes: 80 -> 180`. With the per-test 600s ceiling in scripts/kani-perf.sh, worst-case suite duration is bounded at 15 x 600s = 150 min. 180 min gives ~4x headroom over the baseline. - job `timeout-minutes: 90 -> 380`. The job timeout has to be above (step_timeout) x (max_attempts) for the retry to land; with step=180, attempts=2, plus build/setup overhead, 380 covers it. Signed-off-by: Felipe R. Monteiro <felisous@amazon.com>

celinval and others added 3 commits May 13, 2026 22:34

Upgrade Rust toolchain to nightly-2025-12-04

b2dc886

feliperodri self-assigned this May 14, 2026

github-actions Bot added Z-EndToEndBenchCI Tag a PR to run benchmark CI Z-CompilerBenchCI Tag a PR to run benchmark CI labels May 14, 2026

feliperodri added [I] CI / Infrastructure Work done to CI, tests and infrastructure. and removed Z-EndToEndBenchCI Tag a PR to run benchmark CI Z-CompilerBenchCI Tag a PR to run benchmark CI labels May 14, 2026

feliperodri added 2 commits May 14, 2026 13:59

github-actions Bot added Z-EndToEndBenchCI Tag a PR to run benchmark CI Z-CompilerBenchCI Tag a PR to run benchmark CI labels May 14, 2026

feliperodri added 5 commits May 14, 2026 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Rust toolchain to nightly-2025-12-04#4597

Upgrade Rust toolchain to nightly-2025-12-04#4597
feliperodri wants to merge 10 commits into
model-checking:mainfrom
feliperodri:fix-toolchain-2025-12-04

feliperodri commented May 14, 2026

Uh oh!

feliperodri commented May 14, 2026

Uh oh!

feliperodri commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

feliperodri commented May 14, 2026

Investigation: perf job killed with exit 143 on PR #4526

Local repro (controlled A/B/C, same CBMC 6.8.0, same kissat 4.0.1)

Per-phase numbers from CBMC's own logs (same harness)

Why GH runners report this as "shutdown signal" instead of a test failure

Changes shipped on this branch

What this PR is not trying to do

How to verify locally

Uh oh!

feliperodri commented May 14, 2026

Bisect result: rust-lang/rust#146436 ("Slice iter cleanup")

Uh oh!

feliperodri commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause analysis and sound mitigation plan

What #146436 actually changed (the relevant excerpt)

Why this is fast for rustc but slow for CBMC's symex

Mitigation plan on this branch — and why it is sound

Why a stub of <[u8]>::split_at_checked does not help here (negative result on (a))

What (c) actually does — and why it is sufficient on its own

What this PR is not doing

Defense in depth

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Investigation: `perf` job killed with `exit 143` on PR #4526

feliperodri commented May 14, 2026 •

edited

Loading

Why a stub of `<[u8]>::split_at_checked` does not help here (negative result on (a))