Skip to content

Run Windows-only tests on a GitHub Actions runner (on demand)#1245

Merged
jimmyp merged 12 commits into
mainfrom
jimpelletier/windows-test-runner
Jun 2, 2026
Merged

Run Windows-only tests on a GitHub Actions runner (on demand)#1245
jimmyp merged 12 commits into
mainfrom
jimpelletier/windows-test-runner

Conversation

@jimmyp
Copy link
Copy Markdown
Contributor

@jimmyp jimmyp commented Jun 1, 2026

Background

Some of our integration tests are Windows-only ([WindowsTest]) which me and Claude obviously cant run my Mac. The only option currently is to push to a PR and let CI run, but thats a long feedback loop, this PR adds a new github actions based mechanism to shorten the loop for agents.

Results

Adds .github/workflows/windows-test.yml: a workflow_dispatch-only job on windows-latest that takes a required filter and runs dotnet test against Octopus.Tentacle.Tests.Integration (--framework net8.0 --filter <filter>). No push or pull_request trigger, so it only runs when someone asks.

Also adds a run-windows-tests skill under .claude/skills/ that dispatches the workflow and watches it to a pass/fail.

I've run it end to end: CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNotHang passed on the runner in 543ms, about 5 minutes start to finish.

How to review this PR

  • 🤷 its a brave new agentic world?

Reducing risk

This is dev tooling only, I cant see any real risk

[JIM_BOT.EXE v2.13]

jimmyp and others added 7 commits June 1, 2026 16:07
Windows-only tests ([WindowsTest]) compile on macOS/Apple-Silicon but skip at
runtime. This skill runs them on a local UTM Windows 11 ARM VM over loopback SSH
and returns results synchronously, so they're drivable from the sandbox.

- setup.sh: one-time orchestrator (installs UTM, fetches ISO, serves provision.ps1
  to the guest, configures the loopback SSH port forward, verifies the toolchain).
  Pauses only at the two GUI steps UTM has no CLI for.
- run.sh: per-run entry point. Resolves VM state (absent -> exit 3, stopped ->
  utmctl start + wait, running), rsyncs source (no build artifacts), runs
  dotnet test --filter, streams output back.
- provision.ps1: in-guest setup (OpenSSH, .NET 8 SDK, rsync, authorized key).
- SKILL.md: triggers on Windows-only/skipped tests; documents the exit-code contract.

First-run-validate quality: the VM-dependent paths haven't been exercised against
a real VM yet. Verified: exit-code branches, syntax, exec bits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop UTM (GUI-locked creation) for qemu-system-aarch64 directly, so VM creation
and Windows install are fully scripted and re-runnable:

- lib.sh: shared config + QEMU arg assembly (hvf, edk2 pflash, NVMe disk, virtio-net
  with loopback hostfwd, swtpm TPM 2.0, headless).
- setup.sh: installs qemu/swtpm, builds firmware vars + disk, fetches virtio-win,
  builds an autounattend answer ISO, boots the unattended installer, waits for SSH.
- autounattend.xml: hands-free Win11 ARM64 install (TPM via swtpm), creates dev user
  with autologon, runs provision.ps1 from the answer CD on first logon.
- provision.ps1: installs virtio-net driver, OpenSSH + authorized key, .NET 8 SDK, rsync.
- start-vm.sh: idempotent headless boot + SSH wait.
- run.sh: ensure VM up, rsync source (no build artifacts), dotnet test --filter.

Only non-automated step: acquiring the MS-gated Windows ARM64 ISO.

Verified: bash syntax (all), autounattend.xml well-formed, run.sh exit-2/exit-3
branches, firmware path resolves. Unverified (needs ~30-min install on real hardware):
the install/provision path end-to-end. Likely first-pass fixes: autounattend edition
name, virtio-net ARM64 driver path, rsync cygwin dest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Removes the last manual step: when no ISO is present, setup.sh now calls
fetch-iso.sh, which resolves the latest non-Insider Windows 11 ARM64 build from
the UUP dump API, downloads the UUP payload from Microsoft's update servers, and
converts it to an ISO locally (aria2 + wimlib + cdrtools). Setup is now end-to-end
clickless.

Verified: bash syntax (all), the UUP build-selection jq filter (picks newest
non-Insider arm64). Unverified (needs network + ~5 GB + real run): the get.php
package params and the converter invocation — flagged first-run-validate in fetch-iso.sh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First live run got past the UUP API resolve (build id resolved fine) but the
converter aborted: it hard-requires chntpw, which can't build on Apple Silicon
(the sidneys tap's openssl@1.0 fails its test suite — known EC-curve bug).

Fix: do the API resolve + package download on the Mac, then run uup_download_linux.sh
inside debian:bookworm where aria2/cabextract/wimtools/chntpw/genisoimage install
via apt. The converter's host arch is irrelevant to the ARM64 Windows payload.
Requires Docker (checked up front).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The raw-QEMU local VM was abandoned: brew's edk2 firmware on Apple Silicon only
enumerates USB storage and won't boot the Windows installer (NVMe/virtio block never
appear; drops to UEFI shell). Documented in project memory.

Instead, run Windows-only tests on a windows-latest runner:
- .github/workflows/windows-test.yml: setup-dotnet (global.json 8.0.413), then
  dotnet test Octopus.Tentacle.Tests.Integration --framework net8.0 --filter <filter>.
  Triggered by push to this branch, or workflow_dispatch (filter input) once on main.
- skills/run-windows-tests: run.sh dispatches the workflow and `gh run watch`es it to a
  pass/fail exit; SKILL.md documents the gh-driven loop. Removed all QEMU scaffolding.

github.com is reachable from the sandbox (gh needs sandbox disabled for TLS), so this
loop is drivable end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dPipes

The previous filter (CancelThenAbandon_...) matched nothing on main, where the test is
named CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNotHang. The shared
substring matches the Windows grandchild test on both main and the EFT-3295 branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the push trigger (CLI-triggered only) and the default filter (filter is now a
required workflow_dispatch input, passed via env to avoid script injection; run.sh errors
if no filter is given). Dispatch-only means the workflow must live on the default branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jimmyp jimmyp requested a review from a team as a code owner June 1, 2026 23:10
@jimmyp jimmyp enabled auto-merge (squash) June 1, 2026 23:45
Copy link
Copy Markdown
Contributor

@rhysparry rhysparry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking, but a few possible improvements. Action versions are probably worth fixing.

Comment thread .claude/skills/run-windows-tests/run.sh Outdated
gh workflow run windows-test.yml -f filter="$FILTER" --ref "$REF"

# Give GitHub a moment to register the run, then find and watch it.
sleep 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious whether this will be flaky in practice, but can be addressed if it proves to be an issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... claude is there a better way to do this than a sleep?

Comment on lines +16 to +17
(A local QEMU VM was tried and abandoned: brew's edk2 firmware on Apple Silicon only
enumerates USB storage and won't boot the Windows installer. See the project memory.)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how helpful this is to the agent. I don't think memory is shared in the repo, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude remove this

## How to run

```bash
.claude/skills/run-windows-tests/run.sh "Name~WhenGrandchildHoldsRedirectedPipes"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it clear that this is an example?

Copy link
Copy Markdown
Contributor Author

@jimmyp jimmyp Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude make it clear this is an example test name, you dont need to pass this specifically. Tell the user (which is you) how to construct this parameter

Comment thread .github/workflows/windows-test.yml Outdated
Comment thread .github/workflows/windows-test.yml Outdated
@rhysparry rhysparry disabled auto-merge June 1, 2026 23:50
Copy link
Copy Markdown
Contributor

@gb-8 gb-8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude found some things (see below).

I'm not sure how problematic the hanging test issue is. Is there a cost implication? Or a throughput implication?

Other than that, they don't seem crucial, so I'm ✅ and leaving it up to you.


PR Review: #1245 — Run Windows-only tests on a GitHub Actions runner (on demand)

Author: Jim Pelletier | Base: main | Files: 3 | +110 / -0


Overview

A neat piece of dev tooling that closes a real pain point: [WindowsTest] tests skip entirely on macOS/Apple Silicon, leaving a slow push-to-CI feedback loop as the only option. This PR adds a manual workflow_dispatch GitHub Actions job and a Claude skill (run.sh + SKILL.md) to dispatch it and stream results.
The approach is well-scoped — dispatch-only, no accidental push/pull_request triggers, filter is mandatory.


What works well

  • Security: passing the filter through env: rather than inlining ${{ inputs.filter }} into the run: block is exactly right — prevents script injection.
  • set -euo pipefail and ${1:?...} in run.sh are good defensive defaults.
  • SKILL.md is unusually thorough — the "Common mistakes" section and the workflow_dispatch-on-default-branch gotcha are the kind of thing that would bite repeatedly without documentation.
  • actions/setup-dotnet@v4 with global-json-file: global.json correctly pins the SDK version.

Issues and suggestions

  1. Race condition in run.sh (medium risk)
                                                                                                                                                                                                                                                                                                                          
  sleep 5
  RID="$(gh run list --workflow windows-test.yml --branch "$REF" --limit 1 --json databaseId --jq '.[0].databaseId')"                                                                                                                                                                                                     
    

sleep 5 then take the most-recent run ID is fragile in two ways:

  • GitHub is occasionally slow to register a dispatch → the listed run may be a previous one on the same branch.
  • A parallel dispatch (another dev, a retry) → you watch the wrong run.

gh workflow run can emit the new run URL with --json if you capture stderr, but the cleaner fix is to record the time before dispatch and filter by createdAt:

  BEFORE=$(date -u +%Y-%m-%dT%H:%M:%SZ)                                                                                                                                                                                                                                                                                   
  gh workflow run windows-test.yml -f filter="$FILTER" --ref "$REF"                                                                                                                                                                                                                                                       
  sleep 8                                                           
  RID="$(gh run list --workflow windows-test.yml --branch "$REF" --limit 5 \                                                                                                                                                                                                                                              
    --json databaseId,createdAt \                                   
    --jq "[.[] | select(.createdAt > \"$BEFORE\")] | .[0].databaseId")"                                                                                                                                                                                                                                                   

  1. No timeout-minutes on the workflow job

A hanging test (or a test that unexpectedly runs more than the filtered set) will consume the GitHub Actions default 6-hour limit. Consider:

  jobs:                                                                                                                                                                                                                                                                                                                   
    windows-test:                                                                                                                                                                                                                                                                                                         
      runs-on: windows-latest
      timeout-minutes: 30                                                                                                                                                                                                                                                                                                 

  1. No permissions block

The other workflow in this repo (approve-renovate-pull-request.yml) explicitly declares permissions. Even if only reads are needed, a restrictive permissions: block is good practice and communicates intent:

  permissions:                                                                                                                                                                                                                                                                                                            
    contents: read                                                  

  1. --ref "$REF" silently fails if workflow isn't on main

SKILL.md documents this limitation well, but run.sh will fail with a confusing gh error if run before the PR is merged. A guard at the top of run.sh would make the failure message actionable:

  # Verify the workflow exists on the default branch                                                                                                                                                                                                                                                                      
  DEFAULT=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name')                                                                                                                                                                                                                                           
  if ! gh api "repos/{owner}/{repo}/contents/.github/workflows/windows-test.yml?ref=$DEFAULT" &>/dev/null; then
    echo "ERROR: windows-test.yml is not yet on '$DEFAULT' — merge this PR first." >&2                                                                                                                                                                                                                                    
    exit 1                                                                                                                                                                                                                                                                                                                
  fi                                                                                                                                                                                                                                                                                                                      


Minor nits

  • run.sh line 22: # Give GitHub a moment to register the run, then find and watch it. — the comment implies this is expected to be reliable; worth softening ("may need a longer sleep on a busy runner").
  • The SKILL.md references "project memory" for the QEMU/edk2 story — fine for internal use, but that memory file isn't in this PR. Not a blocker.

Summary

Solid PR. The core design (dispatch-only, mandatory filter, env-based injection guard) is correct and the documentation is genuinely good. The main things worth fixing before merge are the 4 items above.

jimmyp and others added 3 commits June 2, 2026 10:04
Co-authored-by: Rhys Parry <rhys.parry@octopus.com>
Co-authored-by: Rhys Parry <rhys.parry@octopus.com>
- workflow: add `permissions: contents: read` and `timeout-minutes: 30` (caps a hanging
  test at 30 min instead of the 6-hour default — answers the cost/throughput question).
- run.sh: replace the blind `sleep 5` + take-latest with a bounded poll that matches the
  run we just dispatched by creation time (handles slow registration and parallel
  dispatches); add a guard that fails with an actionable message if the workflow isn't on
  the default branch yet, instead of a confusing gh error.
- SKILL.md: remove the project-memory reference (that file isn't in the repo); rewrite the
  "How to run" section so the filter is clearly an example and explain how to construct one.

(actions/checkout@v6 and actions/setup-dotnet@v5 already applied via review suggestions.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread .claude/skills/run-windows-tests/run.sh Outdated
if ! gh api "repos/{owner}/{repo}/contents/.github/workflows/windows-test.yml?ref=$DEFAULT" >/dev/null 2>&1; then
echo "ERROR: windows-test.yml is not on the default branch ('$DEFAULT') yet, so workflow_dispatch can't see it. Merge this branch to '$DEFAULT' first." >&2
exit 1
fi
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this

It only guarded the one-time pre-merge state; once the workflow is on the default
branch it always passes, so it's just an extra gh round-trip on every run. Removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jimmyp jimmyp enabled auto-merge (squash) June 2, 2026 00:15
@jimmyp jimmyp merged commit 165840e into main Jun 2, 2026
51 checks passed
@jimmyp jimmyp deleted the jimpelletier/windows-test-runner branch June 2, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants