ci: retry SonarCloud scan once on transient failure#21604
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the SonarCloud GitHub Actions workflow to reduce merge-queue false positives caused by transient external download/service failures during the Sonar scan.
Changes:
- Runs the first SonarCloud scan attempt with
continue-on-error: trueand captures its outcome via a stepid. - If the first attempt fails, waits 90 seconds and retries the SonarCloud scan once.
- Keeps existing behavior for persistent failures (the job still fails if the retry fails) and leaves cache-warming-only runs unaffected.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
taratorio
approved these changes
Jun 3, 2026
Sahil-4555
pushed a commit
to Sahil-4555/erigon
that referenced
this pull request
Jun 5, 2026
… jar (erigontech#21632) ## Problem The sonar job pulls two artifacts from `scanner.sonarcloud.io` on every scan: a JRE ("JRE provisioning") and the scanner-engine jar. That CDN intermittently 403s GitHub-runner IPs, and the blocks outlive the spaced retry added in erigontech#21604: - [CI Gate merge-queue run](https://github.com/erigontech/erigon/actions/runs/26994431020/job/79661154435): scan and retry both hit `HTTP 403 Forbidden` on the JRE tarball, failing the gate and bouncing erigontech#21562 out of the merge queue — after the coverage suite had already passed. - [CI Gate run on this PR](https://github.com/erigontech/erigon/actions/runs/27003084716/job/79687976307): with JRE provisioning eliminated, scan and retry both 403'd on the engine jar instead. The 403s are IP-scoped blocking, not artifact availability: the exact jar URL that failed serves 200 from outside the runners (published Jun 1, still on the CDN), and `api.sonarcloud.io` answered fine in the same failing run — only the `scanner.sonarcloud.io` host blocks, and for longer than the 90s retry spacing, so a same-runner retry cannot ride it out. ## Fix Remove both per-scan dependencies on that host. **JRE**: skip provisioning and point the scanner at the JDK already baked into the runner image, via the scanner's documented switches: - `SONAR_SCANNER_SKIP_JRE_PROVISIONING=true` - `SONAR_SCANNER_JAVA_EXE_PATH=$JAVA_HOME_21_X64/bin/java` The ubuntu-24.04 image ships Temurin 21 — the same major version Sonar provisions (the failing artifact was `OpenJDK21U-jre_...21.0.9`). The env vars go through `$GITHUB_ENV`, so both the scan and the retry step inherit them. `cleanup-space` in setup-erigon does not remove the preinstalled JDKs. **Engine jar**: seed it into the actions cache from cache-warming push runs; PR and merge-queue scans restore it. The seed step queries `api.sonarcloud.io/analysis/engine` (the host that stays reachable; returns `{filename, sha256, downloadUrl}`), downloads the jar with retries, sha256-verifies it, and saves it in the scanner's content-addressed download-cache layout — `~/.sonar/cache/<sha256>/<filename>`, cache key `sonar-scanner-engine|<sha256>`. At scan time the bootstrapper asks the API for the prescribed sha and, finding it in the local cache, never contacts the CDN. Verified end-to-end with scanner CLI 8.1.0.6389 against a cache seeded exactly as the workflow does it: the debug log shows the metadata call, zero requests to `scanner.sonarcloud.io`, and the engine launched straight from the cached jar. Cache-warming runs on every push to main/release, and SonarCloud rotates engines every few days (12.37 published Jun 1, 12.38 on Jun 5), so seeds refresh within hours of a rotation. ## Failure modes considered - Runner image drops Temurin 21: the `[ -x ... ]` guard leaves the env vars unset and the scanner falls back to downloading, i.e. current behavior. - SonarCloud raises its minimum JRE above 21: the scan fails deterministically with a version error (historically preceded by months of deprecation warnings in the scan log); fix is bumping the env var to `JAVA_HOME_25_X64`, which the image already ships. - Engine cache miss (version rotated since the last base-branch push, or cache evicted): the scanner falls back to the direct download plus the existing retry — today's behavior, never worse. - Seeding fails (the CDN 403s the cache-warming runner too, or the bootstrap API contract changes): the lookup and download steps are `continue-on-error`, no cache is saved, and the next push to the branch retries; scans fall back as above. The scanner CLI zip and GPG key still come per run from `binaries.sonarsource.com` and the keyserver; those hosts have not been the ones failing, and the existing retry covers them. Note: the first engine seed only materializes once this merges (cache-warming triggers on push to main), so this PR's own sonar runs still use the fallback download path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two merge-queue evictions in the last 3 weeks were caused by the SonarCloud scan failing to download the scanner CLI from
binaries.sonarsource.com— not by anything in the queued code:Unexpected HTTP response: 403(the action does not retry 4xx) — evicted p2p/sentry: don't duplicate SendMessageById across shared-store sentries #21597 from the queue with all sibling jobs green or still runningIn the merge queue the sonar job fast-cancels the whole CI Gate run on failure, so a CDN blip cancels ~40 min of green sibling jobs and
github-merge-queueremoves the PR withfailed_checks. Per CI-GUIDELINES.md, merge-queue checks must have no false positives; CDN weather is one.Fix
Give the scan one spaced retry:
continue-on-error: truecache-warming-onlyruns are unaffected (scan skipped → outcome isskipped, so the retry steps skip too). Acontinue-on-errorstep reportsconclusion: successto the jobs API, so ci-gate's root-cause detection won't flag a run recovered by the retry; a double failure is attributed toSonarCloud scan (retry).Alternatives considered
tc.find()lookup can never match SonarSource's 4-segment version string (semver.clean("8.1.0.6389")isnull), so the action's internal tool-cache path is dead code on any runner.scannerBinariesUrl: removes the CDN dependency entirely but adds hosting and per-upgrade maintenance, and the GPG keyserver dependency remains. Can revisit if 403s persist despite the retry.No tests: CI workflow YAML change (TDD not applicable); validated with
actionlintandmake lint.