Retry npm install for disable wasm integration test#3674
Conversation
The Disable WASM integration shard was flaky because the test script ran a single bare `npm ci` before starting Hardhat. In the failed CI job, npm aborted with a transient `ECONNRESET` while fetching packages, so the shard failed before `DisableWasmTest.js` actually ran. Retry `npm ci` with a short backoff and remove any partial `node_modules` contents between attempts. Persistent install failures still fail the job, but temporary registry/network resets no longer make the test shard flaky.
PR SummaryLow Risk Overview Hardhat still runs only after a successful install; a third failed attempt exits with failure unchanged from before, aside from the extra resilience against transient registry errors (e.g. Reviewed by Cursor Bugbot for commit 9399910. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
There was a problem hiding this comment.
A small, correct CI hardening change that wraps npm ci in a 3-attempt retry loop with backoff to address a flaky integration shard. The shell logic correctly handles set -e, fails fast on persistent errors, and cleans partial installs between attempts.
Findings: 0 blocking | 3 non-blocking | 0 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor's second-opinion review file (
cursor-review.md) was empty/produced no output; only Codex provided a pass (no material findings). The repoREVIEW_GUIDELINES.mdwas also empty, so no repo-specific standards were applied. - Optional: the same flaky-install pattern likely affects other integration scripts that run a bare
npm ci; consider extracting a shared retry helper rather than duplicating this loop if the pattern recurs. - Nit:
rm -rf node_modulesbetween attempts is largely redundant becausenpm cialready removesnode_modulesbefore installing — harmless, but worth noting it doesn't clear a corrupt npm cache, which is another common transient-failure cause (npm cache clean --forceor removing~/.npmcould help if ECONNRESET corrupts a cached tarball).
There was a problem hiding this comment.
LGTM, simple CI flake mitigation with retry/backoff for npm ci.
Extended reasoning...
Overview
This PR modifies a single CI shell script, integration_test/evm_module/scripts/disable_wasm.sh, to wrap npm ci in a retry loop (up to 3 attempts) with a 5/10-second linear backoff and node_modules cleanup between attempts. The motivation is a documented flaky CI run where npm aborted with a transient ECONNRESET before the actual Hardhat test could execute.
Security risks
None. This is a CI-only shell script not used at runtime. The retry logic does not introduce any new inputs, network endpoints, or privilege changes. rm -rf node_modules operates within the script's working directory (contracts/) so the blast radius is limited.
Level of scrutiny
Low. This is a mechanical, defensive change to a CI helper script — the kind of routine flake mitigation that does not warrant human review. The control flow is straightforward, set -e semantics are preserved (the if npm ci guards the failure correctly), and the final exit 1 ensures persistent failures still surface.
Other factors
The bug hunting system found no issues. The change is minimal, well-scoped, and the PR description clearly documents the symptom and remediation. No production code paths are touched.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3674 +/- ##
==========================================
- Coverage 59.28% 58.00% -1.28%
==========================================
Files 2272 2179 -93
Lines 188193 177388 -10805
==========================================
- Hits 111567 102902 -8665
+ Misses 66570 65352 -1218
+ Partials 10056 9134 -922
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
The PR wraps npm ci in a 3-attempt retry loop with linear backoff and node_modules cleanup to de-flake the Disable WASM integration shard. The logic is correct — npm ci runs inside an if so set -e doesn't abort early, the 3rd failure still exits non-zero, and the Hardhat invocation is unchanged.
Findings: 0 blocking | 2 non-blocking | 0 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor's review (
cursor-review.md) produced no output (empty file); this synthesis is based on my own analysis plus Codex, which also reported no material findings. - Optional robustness nit: since the retry targets transient registry/network errors, you could add
npm cache clean --forcealongsiderm -rf node_modulesbetween attempts to also recover from a corrupted npm cache. Not required for correctness.
There was a problem hiding this comment.
LGTM — straightforward retry loop on npm ci to deflake the Disable WASM integration shard.
Extended reasoning...
Overview
This PR modifies a single integration test shell script, integration_test/evm_module/scripts/disable_wasm.sh, wrapping the existing npm ci invocation in a 3-attempt retry loop with backoff (5s, 10s) and rm -rf node_modules between attempts. If all three attempts fail, the script exits with status 1, preserving the prior failure semantics. set -e remains in effect, and npx hardhat test still only runs after a successful install.
Security risks
None. This is a CI-only shell script that runs during integration testing; it does not touch any application, chain, auth, crypto, or consensus code. The retry logic uses static integers and does not introduce any injection surface.
Level of scrutiny
Low. Test infrastructure flake fixes of this shape (retry a network-bound step, clean partial artifacts between attempts) are mechanical and well-understood. The scripted if npm ci; then break; fi pattern is compatible with set -e, and the third-attempt exit is explicit.
Other factors
No bugs were flagged by the bug hunting system. The Cursor Bugbot summary and CI checks (Buf) pass. The change is small, self-contained, and reverts cleanly if it doesn't help.
The Disable WASM integration shard was flaky because the test script ran a single bare
npm cibefore starting Hardhat. In the failed CI job, npm aborted with a transientECONNRESETwhile fetching packages, so the shard failed beforeDisableWasmTest.jsactually ran.Retry
npm ciwith a short backoff and remove any partialnode_modulescontents between attempts. Persistent install failures still fail the job, but temporary registry/network resets no longer make the test shard flaky.Flaked in unrelated changes.