Run Windows-only tests on a GitHub Actions runner (on demand) by jimmyp · Pull Request #1245 · OctopusDeploy/OctopusTentacle

jimmyp · 2026-06-01T23:10:57Z

Background

Some of our integration tests are Windows-only ([WindowsTest]) which me and Claude obviously cant run my Mac. The only option currently is to push to a PR and let CI run, but thats a long feedback loop, this PR adds a new github actions based mechanism to shorten the loop for agents.

Results

Adds .github/workflows/windows-test.yml: a workflow_dispatch-only job on windows-latest that takes a required filter and runs dotnet test against Octopus.Tentacle.Tests.Integration (--framework net8.0 --filter <filter>). No push or pull_request trigger, so it only runs when someone asks.

Also adds a run-windows-tests skill under .claude/skills/ that dispatches the workflow and watches it to a pass/fail.

I've run it end to end: CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNotHang passed on the runner in 543ms, about 5 minutes start to finish.

How to review this PR

🤷 its a brave new agentic world?

Reducing risk

This is dev tooling only, I cant see any real risk

[JIM_BOT.EXE v2.13]

Windows-only tests ([WindowsTest]) compile on macOS/Apple-Silicon but skip at runtime. This skill runs them on a local UTM Windows 11 ARM VM over loopback SSH and returns results synchronously, so they're drivable from the sandbox. - setup.sh: one-time orchestrator (installs UTM, fetches ISO, serves provision.ps1 to the guest, configures the loopback SSH port forward, verifies the toolchain). Pauses only at the two GUI steps UTM has no CLI for. - run.sh: per-run entry point. Resolves VM state (absent -> exit 3, stopped -> utmctl start + wait, running), rsyncs source (no build artifacts), runs dotnet test --filter, streams output back. - provision.ps1: in-guest setup (OpenSSH, .NET 8 SDK, rsync, authorized key). - SKILL.md: triggers on Windows-only/skipped tests; documents the exit-code contract. First-run-validate quality: the VM-dependent paths haven't been exercised against a real VM yet. Verified: exit-code branches, syntax, exec bits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop UTM (GUI-locked creation) for qemu-system-aarch64 directly, so VM creation and Windows install are fully scripted and re-runnable: - lib.sh: shared config + QEMU arg assembly (hvf, edk2 pflash, NVMe disk, virtio-net with loopback hostfwd, swtpm TPM 2.0, headless). - setup.sh: installs qemu/swtpm, builds firmware vars + disk, fetches virtio-win, builds an autounattend answer ISO, boots the unattended installer, waits for SSH. - autounattend.xml: hands-free Win11 ARM64 install (TPM via swtpm), creates dev user with autologon, runs provision.ps1 from the answer CD on first logon. - provision.ps1: installs virtio-net driver, OpenSSH + authorized key, .NET 8 SDK, rsync. - start-vm.sh: idempotent headless boot + SSH wait. - run.sh: ensure VM up, rsync source (no build artifacts), dotnet test --filter. Only non-automated step: acquiring the MS-gated Windows ARM64 ISO. Verified: bash syntax (all), autounattend.xml well-formed, run.sh exit-2/exit-3 branches, firmware path resolves. Unverified (needs ~30-min install on real hardware): the install/provision path end-to-end. Likely first-pass fixes: autounattend edition name, virtio-net ARM64 driver path, rsync cygwin dest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Removes the last manual step: when no ISO is present, setup.sh now calls fetch-iso.sh, which resolves the latest non-Insider Windows 11 ARM64 build from the UUP dump API, downloads the UUP payload from Microsoft's update servers, and converts it to an ISO locally (aria2 + wimlib + cdrtools). Setup is now end-to-end clickless. Verified: bash syntax (all), the UUP build-selection jq filter (picks newest non-Insider arm64). Unverified (needs network + ~5 GB + real run): the get.php package params and the converter invocation — flagged first-run-validate in fetch-iso.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

First live run got past the UUP API resolve (build id resolved fine) but the converter aborted: it hard-requires chntpw, which can't build on Apple Silicon (the sidneys tap's openssl@1.0 fails its test suite — known EC-curve bug). Fix: do the API resolve + package download on the Mac, then run uup_download_linux.sh inside debian:bookworm where aria2/cabextract/wimtools/chntpw/genisoimage install via apt. The converter's host arch is irrelevant to the ARM64 Windows payload. Requires Docker (checked up front). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The raw-QEMU local VM was abandoned: brew's edk2 firmware on Apple Silicon only enumerates USB storage and won't boot the Windows installer (NVMe/virtio block never appear; drops to UEFI shell). Documented in project memory. Instead, run Windows-only tests on a windows-latest runner: - .github/workflows/windows-test.yml: setup-dotnet (global.json 8.0.413), then dotnet test Octopus.Tentacle.Tests.Integration --framework net8.0 --filter <filter>. Triggered by push to this branch, or workflow_dispatch (filter input) once on main. - skills/run-windows-tests: run.sh dispatches the workflow and `gh run watch`es it to a pass/fail exit; SKILL.md documents the gh-driven loop. Removed all QEMU scaffolding. github.com is reachable from the sandbox (gh needs sandbox disabled for TLS), so this loop is drivable end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dPipes The previous filter (CancelThenAbandon_...) matched nothing on main, where the test is named CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNotHang. The shared substring matches the Windows grandchild test on both main and the EFT-3295 branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop the push trigger (CLI-triggered only) and the default filter (filter is now a required workflow_dispatch input, passed via env to avoid script injection; run.sh errors if no filter is given). Dispatch-only means the workflow must live on the default branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rhysparry

Nothing blocking, but a few possible improvements. Action versions are probably worth fixing.

rhysparry · 2026-06-01T23:43:47Z

+gh workflow run windows-test.yml -f filter="$FILTER" --ref "$REF"
+
+# Give GitHub a moment to register the run, then find and watch it.
+sleep 5


I'm curious whether this will be flaky in practice, but can be addressed if it proves to be an issue.

Yeah... claude is there a better way to do this than a sleep?

rhysparry · 2026-06-01T23:45:31Z

+(A local QEMU VM was tried and abandoned: brew's edk2 firmware on Apple Silicon only
+enumerates USB storage and won't boot the Windows installer. See the project memory.)


Not sure how helpful this is to the agent. I don't think memory is shared in the repo, right?

Claude remove this

rhysparry · 2026-06-01T23:46:17Z

+## How to run
+
+```bash
+.claude/skills/run-windows-tests/run.sh "Name~WhenGrandchildHoldsRedirectedPipes"


Is it clear that this is an example?

Claude make it clear this is an example test name, you dont need to pass this specifically. Tell the user (which is you) how to construct this parameter

gb-8

Claude found some things (see below).

I'm not sure how problematic the hanging test issue is. Is there a cost implication? Or a throughput implication?

Other than that, they don't seem crucial, so I'm ✅ and leaving it up to you.

PR Review: #1245 — Run Windows-only tests on a GitHub Actions runner (on demand)

Author: Jim Pelletier | Base: main | Files: 3 | +110 / -0

Overview

A neat piece of dev tooling that closes a real pain point: [WindowsTest] tests skip entirely on macOS/Apple Silicon, leaving a slow push-to-CI feedback loop as the only option. This PR adds a manual workflow_dispatch GitHub Actions job and a Claude skill (run.sh + SKILL.md) to dispatch it and stream results.
The approach is well-scoped — dispatch-only, no accidental push/pull_request triggers, filter is mandatory.

What works well

Security: passing the filter through env: rather than inlining ${{ inputs.filter }} into the run: block is exactly right — prevents script injection.
set -euo pipefail and ${1:?...} in run.sh are good defensive defaults.
SKILL.md is unusually thorough — the "Common mistakes" section and the workflow_dispatch-on-default-branch gotcha are the kind of thing that would bite repeatedly without documentation.
actions/setup-dotnet@v4 with global-json-file: global.json correctly pins the SDK version.

Issues and suggestions

Race condition in run.sh (medium risk)

                                                                                                                                                                                                                                                                                                                          
  sleep 5
  RID="$(gh run list --workflow windows-test.yml --branch "$REF" --limit 1 --json databaseId --jq '.[0].databaseId')"

sleep 5 then take the most-recent run ID is fragile in two ways:

GitHub is occasionally slow to register a dispatch → the listed run may be a previous one on the same branch.
A parallel dispatch (another dev, a retry) → you watch the wrong run.

gh workflow run can emit the new run URL with --json if you capture stderr, but the cleaner fix is to record the time before dispatch and filter by createdAt:

  BEFORE=$(date -u +%Y-%m-%dT%H:%M:%SZ)                                                                                                                                                                                                                                                                                   
  gh workflow run windows-test.yml -f filter="$FILTER" --ref "$REF"                                                                                                                                                                                                                                                       
  sleep 8                                                           
  RID="$(gh run list --workflow windows-test.yml --branch "$REF" --limit 5 \                                                                                                                                                                                                                                              
    --json databaseId,createdAt \                                   
    --jq "[.[] | select(.createdAt > \"$BEFORE\")] | .[0].databaseId")"

No timeout-minutes on the workflow job

A hanging test (or a test that unexpectedly runs more than the filtered set) will consume the GitHub Actions default 6-hour limit. Consider:

  jobs:                                                                                                                                                                                                                                                                                                                   
    windows-test:                                                                                                                                                                                                                                                                                                         
      runs-on: windows-latest
      timeout-minutes: 30

No permissions block

The other workflow in this repo (approve-renovate-pull-request.yml) explicitly declares permissions. Even if only reads are needed, a restrictive permissions: block is good practice and communicates intent:

  permissions:                                                                                                                                                                                                                                                                                                            
    contents: read

--ref "$REF" silently fails if workflow isn't on main

SKILL.md documents this limitation well, but run.sh will fail with a confusing gh error if run before the PR is merged. A guard at the top of run.sh would make the failure message actionable:

  # Verify the workflow exists on the default branch                                                                                                                                                                                                                                                                      
  DEFAULT=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name')                                                                                                                                                                                                                                           
  if ! gh api "repos/{owner}/{repo}/contents/.github/workflows/windows-test.yml?ref=$DEFAULT" &>/dev/null; then
    echo "ERROR: windows-test.yml is not yet on '$DEFAULT' — merge this PR first." >&2                                                                                                                                                                                                                                    
    exit 1                                                                                                                                                                                                                                                                                                                
  fi

Minor nits

run.sh line 22: # Give GitHub a moment to register the run, then find and watch it. — the comment implies this is expected to be reliable; worth softening ("may need a longer sleep on a busy runner").
The SKILL.md references "project memory" for the QEMU/edk2 story — fine for internal use, but that memory file isn't in this PR. Not a blocker.

Summary

Solid PR. The core design (dispatch-only, mandatory filter, env-based injection guard) is correct and the documentation is genuinely good. The main things worth fixing before merge are the 4 items above.

Co-authored-by: Rhys Parry <rhys.parry@octopus.com>

- workflow: add `permissions: contents: read` and `timeout-minutes: 30` (caps a hanging test at 30 min instead of the 6-hour default — answers the cost/throughput question). - run.sh: replace the blind `sleep 5` + take-latest with a bounded poll that matches the run we just dispatched by creation time (handles slow registration and parallel dispatches); add a guard that fails with an actionable message if the workflow isn't on the default branch yet, instead of a confusing gh error. - SKILL.md: remove the project-memory reference (that file isn't in the repo); rewrite the "How to run" section so the filter is clearly an example and explain how to construct one. (actions/checkout@v6 and actions/setup-dotnet@v5 already applied via review suggestions.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jimmyp · 2026-06-02T00:12:40Z

+if ! gh api "repos/{owner}/{repo}/contents/.github/workflows/windows-test.yml?ref=$DEFAULT" >/dev/null 2>&1; then
+  echo "ERROR: windows-test.yml is not on the default branch ('$DEFAULT') yet, so workflow_dispatch can't see it. Merge this branch to '$DEFAULT' first." >&2
+  exit 1
+fi


Delete this

It only guarded the one-time pre-merge state; once the workflow is on the default branch it always passes, so it's just an extra gh round-trip on every run. Removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jimmyp and others added 7 commits June 1, 2026 16:07

jimmyp requested a review from a team as a code owner June 1, 2026 23:10

LukeButters approved these changes Jun 1, 2026

View reviewed changes

Merge branch 'main' into jimpelletier/windows-test-runner

d408083

jimmyp enabled auto-merge (squash) June 1, 2026 23:45

rhysparry approved these changes Jun 1, 2026

View reviewed changes

rhysparry disabled auto-merge June 1, 2026 23:50

gb-8 approved these changes Jun 1, 2026

View reviewed changes

jimmyp and others added 3 commits June 2, 2026 10:04

Apply suggestion from @rhysparry

3ffa45b

Co-authored-by: Rhys Parry <rhys.parry@octopus.com>

Apply suggestion from @rhysparry

a6c6a6a

Co-authored-by: Rhys Parry <rhys.parry@octopus.com>

jimmyp commented Jun 2, 2026

View reviewed changes

run.sh: drop the default-branch guard

337eac3

It only guarded the one-time pre-merge state; once the workflow is on the default branch it always passes, so it's just an extra gh round-trip on every run. Removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jimmyp enabled auto-merge (squash) June 2, 2026 00:15

jimmyp merged commit 165840e into main Jun 2, 2026
51 checks passed

jimmyp deleted the jimpelletier/windows-test-runner branch June 2, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Windows-only tests on a GitHub Actions runner (on demand)#1245

Run Windows-only tests on a GitHub Actions runner (on demand)#1245
jimmyp merged 12 commits into
mainfrom
jimpelletier/windows-test-runner

jimmyp commented Jun 1, 2026 •

edited

Loading

Uh oh!

rhysparry left a comment

Uh oh!

rhysparry Jun 1, 2026

Uh oh!

jimmyp Jun 2, 2026

Uh oh!

rhysparry Jun 1, 2026

Uh oh!

jimmyp Jun 2, 2026

Uh oh!

rhysparry Jun 1, 2026

Uh oh!

jimmyp Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gb-8 left a comment

Uh oh!

jimmyp Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		(A local QEMU VM was tried and abandoned: brew's edk2 firmware on Apple Silicon only
		enumerates USB storage and won't boot the Windows installer. See the project memory.)

Conversation

jimmyp commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Results

How to review this PR

Reducing risk

Uh oh!

rhysparry left a comment

Choose a reason for hiding this comment

Uh oh!

rhysparry Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jimmyp Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

rhysparry Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jimmyp Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

rhysparry Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jimmyp Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gb-8 left a comment

Choose a reason for hiding this comment

Uh oh!

jimmyp Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jimmyp commented Jun 1, 2026 •

edited

Loading

jimmyp Jun 2, 2026 •

edited

Loading