Skip to content

ci: retry upload-artifact via Wandalen/wretry.action#5975

Merged
Fedr merged 4 commits into
masterfrom
cicd/retry-upload-artifacts
Apr 24, 2026
Merged

ci: retry upload-artifact via Wandalen/wretry.action#5975
Fedr merged 4 commits into
masterfrom
cicd/retry-upload-artifacts

Conversation

@Fedr
Copy link
Copy Markdown
Contributor

@Fedr Fedr commented Apr 24, 2026

Summary

Wraps each actions/upload-artifact@v7 step in the Windows and macOS workflows with Wandalen/wretry.action so transient self-hosted-runner network flakes during artifact upload (typically ENOTFOUND on CreateArtifact) no longer fail the whole build.

actions/upload-artifact@v7 retries transient twirp Request timeouts internally but treats ENOTFOUND / DNS errors as fatal — a ~45-minute build is lost to a one-second DNS blip. Our self-hosted runners have exhibited this failure mode intermittently.

Change

6 actions/upload-artifact@v7 steps wrapped:

  • .github/workflows/build-test-macos.ymlUpload Macos Distribution + Upload NuGet files to Artifacts
  • .github/workflows/build-test-windows.ymlUpload Windows Binaries Archive + Upload MeshLibC2 headers archive + Upload NuGet files to Artifacts + Upload NuGet library DLL and XML to Artifacts

Each wrap:

- name: Upload …
  uses: Wandalen/wretry.action@e68c23e6309f2871ca8ae4763e7629b9c258e1ea # v3.8.0
  with:
    action: actions/upload-artifact@v7
    attempt_limit: 3
    attempt_delay: 30000 # milliseconds
    with: |
      name: …
      path: …
      retention-days: 1
      overwrite: true

3 attempts × 30 s delay covers a typical 1-minute github.com blip. overwrite: true lets a retry replace a partial artifact atomically (upload-artifact@v4+ finalises atomically, so this is safe).

Scope

Only the self-hosted-runner workflows (Windows + macOS arm64/Release matrix leg). The other upload-artifact sites run on GitHub-hosted runners and haven't exhibited this failure mode; disable-build-* labels suppress non-Windows / non-macOS CI in this PR.

Note: this PR intentionally does not wrap actions/checkout — a separate attempt to do so (#5969) hit a startup_failure when wretry's composite layer tried to dispatch around actions/checkout@v6. That PR now uses an in-workflow git-submodule retry loop instead. actions/upload-artifact@v7 doesn't trigger the same parse-time refusal.

Test plan

  • Windows and macOS CI on this PR complete normally (first-attempt upload succeeds; delays skipped).
  • A future build where CreateArtifact hits a transient 5xx / ENOTFOUND: retry kicks in, job still succeeds.

Fedr and others added 2 commits April 24, 2026 15:11
…low-list

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wrapped steps are self-documenting via the Wandalen/wretry.action
usage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Fedr Fedr requested a review from Grantim April 24, 2026 12:54
@Fedr Fedr merged commit f38b398 into master Apr 24, 2026
27 checks passed
@Fedr Fedr deleted the cicd/retry-upload-artifacts branch April 24, 2026 13:15
Fedr added a commit that referenced this pull request Apr 30, 2026
Most of our Windows CI runs on github-hosted runners, where transient
DNS / connect failures are not an issue, so the wretry.action wrapper
that #5975 added isn't carrying its weight there.

It is also noisy: on Windows, every wretry-wrapped step emits

    Warning: Environment variable 'INPUT_GITHUB_CONTEXT' exceeds the
    maximum supported length. Environment variable length: 39289 ,
    Maximum supported length: 32766

because wretry forwards `toJson(github)` via `INPUT_GITHUB_CONTEXT`,
which exceeds Windows' 32767-char env var limit. See
Wandalen/wretry.action#192.

The macOS workflow keeps the wrapper, since those runners are
self-hosted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants