Skip to content

gh-workflows: make cache hits optional in eve build jobs#5734

Closed
europaul wants to merge 3 commits intolf-edge:masterfrom
europaul:gh-workflows/optional-cache-for-eve-build
Closed

gh-workflows: make cache hits optional in eve build jobs#5734
europaul wants to merge 3 commits intolf-edge:masterfrom
europaul:gh-workflows/optional-cache-for-eve-build

Conversation

@europaul
Copy link
Copy Markdown
Contributor

@europaul europaul commented Apr 2, 2026

Description

GitHub Actions cache entries expire too quickly, so between the packages job
and the eve job they are often already gone. This causes the eve job to fail
on cache miss rather than just rebuilding.

Make the cache restore steps non-fatal by removing fail-on-cache-miss and
gating the docker-load steps on actual cache hits. Change the eve build step
to run make pkgs eve instead of just make eve so that if the cache was
not restored, packages are simply rebuilt.

Applied to both build.yml (PR builds) and buildondemand.yml (on-demand
builds). publish.yml is not affected since it pushes packages to Docker Hub
and the eve job pulls from the registry.

How to test and validate this PR

  • Trigger a PR build and verify that eve jobs succeed both when the cache is
    warm (packages job finishes quickly and cache is still valid) and when the
    cache has expired (eve job rebuilds packages via make pkgs eve).
  • Trigger an on-demand build (workflow_dispatch) and verify the same
    behavior for arm64/riscv64 matrix entries.

Changelog notes

No user-facing changes.

PR Backports

  • 16.0-stable: No, CI-only change.
  • 14.5-stable: No, CI-only change.
  • 13.4-stable: No, CI-only change.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR
  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

🤖 Generated with Claude Code

@github-actions github-actions Bot requested a review from uncleDecart April 2, 2026 14:06
@europaul europaul requested a review from rene April 2, 2026 14:08
Copy link
Copy Markdown
Contributor

@rucoder rucoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@europaul i do not think we need check for riscv64. now it is not different from x86 or arm. And it would be nice o upload cache even if some packages are failed to build. int hit case re-run will be much faster

Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be getting
Error: Failed to restore cache entry. Exiting as fail-on-cache-miss is set. Input key: linuxkit-amd64-bfa34420ddb162ede9036e29fa8f6b360d47fdb1-generic-k
for many (but not all) of the PRs.
I don't know whether this will help or whether it will make the builds exceed the total time allowed on the runner.
But if we think it can help we can give it a try.

@europaul europaul force-pushed the gh-workflows/optional-cache-for-eve-build branch from 6749c1d to 3968b50 Compare April 8, 2026 16:05
@github-actions github-actions Bot requested a review from eriknordmark April 8, 2026 16:05
@europaul
Copy link
Copy Markdown
Contributor Author

europaul commented Apr 8, 2026

@europaul i do not think we need check for riscv64. now it is not different from x86 or arm. And it would be nice o upload cache even if some packages are failed to build. int hit case re-run will be much faster

@rucoder

  1. I don't think anything changed wrt to riscv64 - we're still building it on amd64 runners, so we still need the special cases.
  2. I'm afraid that if we also add the failed packages to cache then it's gonna fill up even faster

europaul and others added 2 commits April 8, 2026 19:19
GitHub Actions cache entries expire too quickly, so between the
packages job and the eve job they are often already gone. This causes
the eve job to fail on cache miss rather than just rebuilding.

Make the cache restore steps non-fatal by removing fail-on-cache-miss
and gating the docker-load steps on actual cache hits. Change the eve
build step to run 'make pkgs eve' instead of just 'make eve' so that
if the cache was not restored, packages are simply rebuilt.

Add qemu setup to the eve job for riscv64, since it cross-builds on
amd64 runners and needs qemu if packages must be rebuilt from scratch.

Applied to both build.yml (PR builds) and buildondemand.yml (on-demand
builds).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Gaiduk <paulg@zededa.com>
The paths-ignore filter excluded all of .github/**, which meant
changes to the build workflow itself would not trigger a PR build.
Add a negation pattern to re-include build.yml so the workflow
can validate its own changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Gaiduk <paulg@zededa.com>
@europaul europaul force-pushed the gh-workflows/optional-cache-for-eve-build branch 2 times, most recently from fbcb710 to 14aad05 Compare April 8, 2026 17:46
Remove dead 'ensure zstd for cache' step in build.yml that referenced
nonexistent matrix.os with stale runner names.

Add .github/actionlint.yaml to whitelist custom self-hosted runner
labels (zededa-ubuntu-2204, zededa-ubuntu-2204-arm64, jumbo).

Pass pull_request.head.ref through an environment variable instead of
using it directly in an inline script to prevent potential script
injection from malicious branch names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Gaiduk <paulg@zededa.com>
@europaul europaul force-pushed the gh-workflows/optional-cache-for-eve-build branch from 14aad05 to 220e039 Compare April 8, 2026 17:49
inputs:
command:
required: true
type: string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why type is getting removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we hit this error from actionlint

e:.github/workflows/buildondemand.yml:71:15: could not parse action metadata in "/github/workspace/.github/actions/run-make": line 6: unexpected key "type" for definition of input "clean" [action]

I think those fields have been removed a long time ago - my local linter was always showing error. But since we have actionlint now in CI - it's an actual CI error to have them

run: | # Runners must provide default credentials
docker login

- name: ensure packages for cross-arch build
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this getting included again? We don't need it anymore since our runners are already prepared for binary emulation....

@europaul
Copy link
Copy Markdown
Contributor Author

This PR is gonna be superseded by #5782

@europaul europaul closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants