ci: cache kurtosis infra images and retry engine bootstrap#21602
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Kurtosis-based CI workflows against transient Docker Hub outages by pre-caching Kurtosis infrastructure images and moving Kurtosis engine bootstrap into an explicit, retryable step before the (non-retryable) composite action runs.
Changes:
- Pin the Kurtosis CLI version and pass it into the assertoor action to avoid silent CLI/engine drift.
- Extend Docker image caching to include Kurtosis infra images (engine/core/files-artifacts-expander/vector/fluent-bit) and add retry-with-backoff for cache-miss pulls.
- Add a dedicated “Install Kurtosis CLI and start engine” step with retries so engine bootstrap is no longer hidden mid-action.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| .github/workflows/test-kurtosis-gloas.yml | Adds pinned Kurtosis version + infra image caching, pull retries, and a retryable engine bootstrap step before running the assertoor action. |
| .github/workflows/test-kurtosis-assertoor.yml | Adds conditional Docker Hub login for the matrix job, pins Kurtosis version + infra image caching, pull retries, and a retryable engine bootstrap step before running the assertoor action. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
taratorio
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In this caplin-minimal kurtosis job Docker Hub was unreachable from the runner: the Kurtosis engine bootstrap — which happens inside
kurtosis run, mid-action, where it cannot be retried — tried to pulltimberio/vector:0.45.0-debianand timed out onregistry-1.docker.io, failing the job before any test ran. The workflows already cache CL images (docker save / actions/cache / docker load) precisely to avoid Docker Hub exposure, but Kurtosis's own infrastructure images weren't covered.Fix
Applied to
test-kurtosis-assertoor.ymlandtest-kurtosis-gloas.yml.Take Docker Hub off the critical path
KURTOSIS_VERSION: 1.15.2and pass it to the assertoor action via itskurtosis_versioninput. Previously the action installed whatever apt.fury.io serves (its default islatest), so the CLI version — and with it the engine image tag — could drift silently.kurtosistech/engine,kurtosistech/core(APIC) andkurtosistech/files-artifacts-expander(all tagged with the CLI version),timberio/vector:0.45.0-debian(logs aggregator — the pull that failed) andfluent/fluent-bit:4.0.0(logs collector). Kurtosis uses themissingimage-download mode, so pre-loaded images are used without any registry call.Retry the cheap part
kurtosis engine stopbetween attempts. The action reuses a running engine, so engine bootstrap moves out of the un-retryable composite action into a retryable ~15 s step — a registry blip no longer costs a 20+ minute test step.docker pull3× with backoff.Notes
pull_requestevents, so oneworkflow_dispatchrun after merge warms it.ethereum-genesis-generator— version owned by the package branch, falls back to a normal pull), andqa-txpool-performance-test.yml(erigontech fork of the action on self-hosted runners with persistent local image caches).actionlint(same flags as the lint workflow) andshellcheckon the new run blocks; image names/tags verified against the kurtosis 1.15.2 sources.