diff --git a/.claude/skills/analyze-action-pr/SKILL.md b/.claude/skills/analyze-action-pr/SKILL.md index 2103c5bf..d048404d 100644 --- a/.claude/skills/analyze-action-pr/SKILL.md +++ b/.claude/skills/analyze-action-pr/SKILL.md @@ -19,7 +19,7 @@ --- name: analyze-action-pr -description: Triage a PR that adds or bumps an action in this repo's allowlist. Runs verify-action-build, classifies each failing action (clean / pipe-to-shell / unverified-download / nested-action-issue / verify-script-bug), and proposes concrete next actions — recommend approval, open an upstream issue + ping the PR author, or fix verify-action-build itself with a regression test. Use when the user says "analyze PR ", "triage PR ", "verify PR ", or otherwise asks to review an action-allowlist PR in this repo. +description: Triage a PR that adds or bumps an action in this repo's allowlist. Runs verify-action-build, classifies each failing action (clean / pipe-to-shell / unverified-download / nested-action-issue / metadata-only / verify-script gap / in-tree native binaries) and proposes concrete next actions — recommend approval, open an upstream issue + ping the PR author, or fix verify-action-build itself with a regression test. Use when the user says "analyze PR ", "triage PR ", "verify PR ", or otherwise asks to review an action-allowlist PR in this repo. --- # Analyze an apache/infrastructure-actions PR @@ -69,6 +69,7 @@ under "Classify". | **C** | nested-action issue | Top-level action passes but a `uses:` dependency (e.g. `install/foo`) hits A or B | | **D** | metadata-only | `No LICENSE`, input interpolation in `run:` blocks, `GITHUB_PATH` writes — soft warnings, mention but don't block | | **E** | verify-script gap | The script gets the wrong answer for a reason unrelated to the action's actual security: false positive (regex hole, missing pattern), missing capability (new action type / build flow / verification mechanism it doesn't yet recognize), bad attribution (extractor drops an action that's clearly in the diff), or a check that misreads a legitimate input shape | +| **F** | unverified in-tree binaries | The action ships pre-compiled native binaries directly in the repo (Go cross-compile, `.exe` / `.dll` / `.so` / `.dylib`, `.jar`, `.wasm`, etc.) and exec's them from a small launcher. `verify-action-build`'s `In-tree binary check` tries to reconcile each binary with verifiable upstream provenance — first via `gh attestation verify --owner ` (SLSA attestation transparency log), then by comparing SHA256 against the release's `SHA256SUMS` asset. Binaries that pass either check are ✓; those that pass neither are case **F** and should be rejected until upstream adds provenance (`actions/attest-build-provenance` or a signed `SHA256SUMS`). Successful verification is a hard pass: the binary's bytes are tied back to the workflow run that produced them or to a release-time checksum | ### 4. Look up upstream verification material (for A/B/C) @@ -116,6 +117,50 @@ Two messages, both held until the user OKs: Do **not** approve the action. +#### F — unverified in-tree native binaries + +`verify-action-build`'s `In-tree binary check` reports a per-binary +result. Three outcomes: + +- ✓ **Verified via `gh attestation verify`** — the binary's bytes are + in GitHub's attestation transparency log under the action's owner. + Pass; recommend approval. +- ✓ **Verified against `SHA256SUMS` release asset** — the binary's + SHA256 matches the release's checksum file. Slightly weaker than + attestation (no signature on the checksum file itself yet) but a + strong cryptographic chain from release to artifact. Pass. +- ✗ **Unverified** — neither mechanism worked: no attestation, no + matching SHA256SUMS entry, or the hash didn't match. Hard reject. + +For the unverified case, two messages, both held until the user OKs: + +1. **Upstream issue** asking the action's maintainers to add either + `actions/attest-build-provenance` to their release workflow (so the + binaries get a SLSA attestation verifiable via `gh attestation + verify`) or a `SHA256SUMS` file shipped as a release asset. Quote + the offending paths and explain the downstream-review impact. See + the runs-on/action#36 / runs-on/action#37 thread as a worked + example: the maintainer accepted a one-line PR that added + `actions/attest-build-provenance` and started shipping `SHA256SUMS` + on each release; once that landed, future bumps verified + automatically. +2. **Comment** on the ASF PR pinging the PR author, summarising why + approval is held: the JS-rebuild check only verifies the launcher, + not the binary, and we have no way to tie what runs on the runner + back to the published source. Link the upstream issue. + +Before recommending retroactive rejection of an already-approved version +in this shape, **check downstream impact** with code search: + +``` +gh api 'search/code?q=%2F+org%3Aapache+language%3AYAML' \ + --jq '.items[] | "\(.repository.full_name) \(.path)"' +``` + +If multiple ASF projects depend on the action, loop in their maintainers +(e.g. `@apache/-committers`) before any retroactive decision — +they pay the cost of pinning back to an older version or migrating off. + #### D only (passing verification but with metadata warnings) Note the warnings and recommend approval — the user typically approves @@ -229,3 +274,6 @@ future runs can cite a precedent instead of re-deriving the analysis. | #802 | carabiner-dev nested `install/{ampel,bnd}` do `curl + chmod 0755`; upstream ships SLSA `provenance.json` / `attestations.jsonl` | C | Upstream issue carabiner-dev/actions#51 + PR comment | | #803 | 3 actions in one diff; extractor only got the wholly-new key | E | Fix landed in PR #804 | | #806 | `jbangdev/setup-jbang` does `curl ... \| bash`; upstream ships SHA256/GPG | A | Upstream issue jbangdev/setup-jbang#16 + PR comment | +| #813 | `browser-actions/setup-firefox@v1.7.2` ships a minimal `{"type":"module"}` package.json with no deps; lock-file check too strict | E | Fix landed in PR #816 | +| #809 | `runs-on/action@v2.1.1` ships ~10 MB of UPX-packed Go binaries (`main-linux-amd64`, `main-linux-arm64`, `main-windows-amd64.exe`); launcher exec's them as root; no SLSA, no SHA256SUMS | F | Upstream issue runs-on/action#36; deferred until upstream adds provenance | +| #825 | `runs-on/action@v2.1.2` — same in-tree binaries as v2.1.1, but upstream now ships SLSA attestations (`actions/attest-build-provenance` was wired in via runs-on/action#37) plus a `SHA256SUMS` release asset | F (verified) | Pass — `In-tree binary check` reports all 3 binaries verified via `gh attestation verify` | diff --git a/README.md b/README.md index 98a7fc6b..27e74991 100644 --- a/README.md +++ b/README.md @@ -234,6 +234,7 @@ When reviewing an action (new or updated), watch for these potential issues in t - **Obfuscated code**: hex-encoded strings, base64 blobs, or intentionally unreadable code in source files (not in compiled `dist/`). - **File-system tampering**: writing to locations outside the workspace (`$GITHUB_WORKSPACE`), modifying `$GITHUB_ENV`, `$GITHUB_PATH`, or `$GITHUB_OUTPUT` in unexpected ways to influence subsequent workflow steps. - **Compiled JS mismatch**: any unexplained diff between the published `dist/` and a clean rebuild — this is the primary check the verification script performs. +- **Pre-compiled native binaries shipped in-tree**: actions that commit Go/Rust/C-style binaries (`main-linux-amd64`, `*.exe`, `*.dll`, `*.so`, `*.dylib`, `*.jar`, `*.wasm`, etc.) directly in the repo and exec them from a small launcher are running opaque executable code on the runner. The JS-rebuild check verifies the launcher but **cannot** reconcile the binaries with source on its own. `verify-action-build`'s **In-tree binary check** tries to close the gap automatically: each detected binary is verified first via `gh attestation verify --owner ` (the SLSA attestation transparency log populated by [`actions/attest-build-provenance`](https://github.com/actions/attest-build-provenance)), then by SHA256-comparing each binary against the release's `SHA256SUMS` asset. Binaries that pass either check are ✓; binaries that pass neither are a hard reject. Push back on actions in this shape until upstream adds attestation or `SHA256SUMS` so the chain from release to artifact can be verified. For the full approval policy and requirements, see the [ASF GitHub Actions Policy](https://infra.apache.org/github-actions-policy.html). diff --git a/utils/tests/verify_action_build/test_security.py b/utils/tests/verify_action_build/test_security.py index 5e2a419b..ed241733 100644 --- a/utils/tests/verify_action_build/test_security.py +++ b/utils/tests/verify_action_build/test_security.py @@ -22,6 +22,7 @@ analyze_binary_downloads, analyze_binary_downloads_recursive, analyze_dockerfile, + analyze_in_tree_binaries, analyze_lock_files, analyze_scripts, analyze_action_metadata, @@ -30,6 +31,8 @@ from verify_action_build.security import ( _file_is_pure_data_fetch, _find_binary_downloads_js, + _looks_like_in_tree_binary, + _parse_sha256sums, ) @@ -1231,3 +1234,361 @@ def test_sigstore_and_cosign_still_match(self): # Other existing patterns kept. for snippet in ("import * as sigstore from 'sigstore'", "cosign verify-blob"): assert self._has_verification(snippet) is True, snippet + + +class TestLooksLikeInTreeBinary: + """Path-only heuristic for catching pre-compiled binary files in an + action's tree. These tests pin the boundaries of the regex + extension + list so future tweaks don't accidentally widen or narrow the catch. + """ + + def test_go_cross_compile_naming(self): + # The runs-on/action shape: --(.exe)? + for name in ( + "main-linux-amd64", + "main-linux-arm64", + "main-darwin-amd64", + "main-darwin-arm64", + "main-windows-amd64.exe", + "tool-freebsd-amd64", + "agent-aix-ppc64le", + ): + assert _looks_like_in_tree_binary(name), name + + def test_known_binary_extensions(self): + for name in ( + "foo.exe", "foo.dll", "foo.so", "foo.dylib", "foo.bin", + "package.deb", "package.rpm", "Installer.msi", "App.dmg", + "archive.appimage", + "module.wasm", + "Library.jar", "App.war", "Foo.class", + "object.o", "object.a", "object.lib", "object.obj", + ): + assert _looks_like_in_tree_binary(name), name + + def test_extension_is_case_insensitive(self): + # Windows PE files often have an upper-case ``.EXE``. + assert _looks_like_in_tree_binary("Setup.EXE") is True + assert _looks_like_in_tree_binary("App.DLL") is True + + def test_normal_action_files_not_flagged(self): + for name in ( + "action.yml", "package.json", "package-lock.json", + "tsconfig.json", "README.md", "CHANGELOG.md", + "index.js", "post.js", "index.template.js", + "src/main.ts", "dist/index.js", "dist/index.js.map", + "go.mod", "go.sum", "main.go", + "Makefile", ".gitignore", "LICENSE", + ): + assert not _looks_like_in_tree_binary(name), name + + def test_substring_matches_dont_falsely_match(self): + # ``binsearch.md`` ends in ``-md`` but isn't ``-darwin-..``. + # ``setup-node-cache`` contains a hyphen but no os/arch suffix. + # ``configure-aws-credentials`` ditto. + for name in ( + "binsearch.md", + "setup-node-cache", + "configure-aws-credentials", + "node-fetch-helper.js", + "linux-distro-detect.sh", + ): + assert not _looks_like_in_tree_binary(name), name + + def test_licenses_txt_exempt(self): + # webpack/ncc commonly ship a licenses.txt next to the bundle. + # It's text metadata, not a binary, even though some bundlers + # name it confusingly. + assert _looks_like_in_tree_binary("dist/licenses.txt") is False + assert _looks_like_in_tree_binary("licenses.txt") is False + + +class TestParseSha256sums: + """Parse the standard `` `` format used by ``sha256sum`` + and emitted by GitHub's ``actions/attest-build-provenance`` example + workflows. Tolerant of comments, blank lines, the ``*`` binary-mode + marker and trailing whitespace; rejects malformed lines silently.""" + + def test_canonical_format(self): + text = ( + "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b main-linux-amd64\n" + "9c5e12573ff6d953068da8def2133d08a907d431c9ebf77ecff7036ef2792332 main-linux-arm64\n" + "75adbbab7aaf4ca05294d198a2175eeb921ababcd7faf9b07b328cddd4e13e54 main-windows-amd64.exe\n" + ) + result = _parse_sha256sums(text) + assert result == { + "main-linux-amd64": "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b", + "main-linux-arm64": "9c5e12573ff6d953068da8def2133d08a907d431c9ebf77ecff7036ef2792332", + "main-windows-amd64.exe": "75adbbab7aaf4ca05294d198a2175eeb921ababcd7faf9b07b328cddd4e13e54", + } + + def test_binary_mode_marker_stripped(self): + # ``sha256sum -b`` emits " *" — must still parse. + text = "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b *main-linux-amd64\n" + result = _parse_sha256sums(text) + assert "main-linux-amd64" in result + assert result["main-linux-amd64"].startswith("ef4c45e8") + + def test_uppercase_digest_normalised_to_lowercase(self): + text = "EF4C45E8F554EFA1C79C7EF8213856698209CD693D32DC268A02DD38FF770B1B X\n" + result = _parse_sha256sums(text) + assert result == {"X": "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b"} + + def test_comments_and_blank_lines_ignored(self): + text = ( + "# This is a comment\n" + "\n" + "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b X\n" + " \n" + ) + assert _parse_sha256sums(text) == { + "X": "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b", + } + + def test_malformed_lines_dropped(self): + text = ( + "not-a-hash main-linux-amd64\n" + "deadbeef too-short-digest\n" + "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b good\n" + "z" * 64 + " non-hex\n" + ) + assert _parse_sha256sums(text) == { + "good": "ef4c45e8f554efa1c79c7ef8213856698209cd693d32dc268a02dd38ff770b1b", + } + + def test_empty_input(self): + assert _parse_sha256sums("") == {} + + +class TestAnalyzeInTreeBinaries: + """Whole-action check. Detects pre-compiled binaries; verifies each + against ``gh attestation verify`` (preferred) or the release's + ``SHA256SUMS`` asset; only unverified binaries fail the action. + """ + + @staticmethod + def _mocked_env( + paths, + *, + tag=None, + sha256sums_text=None, + attest_ok=False, + blob_bytes=None, + ): + """Mock the whole verification cascade. Defaults: no tag found, + no release SHA256SUMS, gh attestation says no — i.e. nothing to + verify against, so detected binaries become hard errors.""" + # Use a tiny non-empty payload so SHA256 is deterministic across + # tests but not coincidentally any real release's hash. + default_bytes = b"BIN" + return _Patches( + mock.patch( + "verify_action_build.security._list_repo_files", + return_value=list(paths), + ), + mock.patch( + "verify_action_build.security._fetch_blob_bytes", + return_value=blob_bytes if blob_bytes is not None else default_bytes, + ), + mock.patch( + "verify_action_build.security._resolve_tag_for_commit", + return_value=tag, + ), + mock.patch( + "verify_action_build.security._fetch_release_asset_text", + return_value=sha256sums_text, + ), + mock.patch( + "verify_action_build.security._verify_via_gh_attestation", + return_value=attest_ok, + ), + ) + + def test_runs_on_action_shape_without_provenance_fails(self): + # The exact shape that prompted this check: pre-compiled Go + # binaries committed in the repo root next to a tiny launcher, + # AND no SHA256SUMS / attestation available — i.e. v2.1.1 era, + # before runs-on/action#37 added provenance. + paths = [ + ".gitignore", "LICENSE", "Makefile", "README.md", + "action.yml", "go.mod", "go.sum", + "index.js", "index.template.js", "post.js", "main.go", + "main-linux-amd64", "main-linux-arm64", + "main-windows-amd64.exe", + ] + with self._mocked_env(paths): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + msg = errors[0] + assert "main-linux-amd64" in msg + assert "3 unverified pre-compiled binary" in msg + + def test_runs_on_action_shape_with_attestation_passes(self): + # v2.1.2 era: same in-tree binaries, but now ``gh attestation + # verify`` succeeds for each. Action passes the check. + paths = [ + "action.yml", "go.mod", "go.sum", "index.js", "main.go", + "main-linux-amd64", "main-linux-arm64", + "main-windows-amd64.exe", + ] + with self._mocked_env(paths, attest_ok=True): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert errors == [] + + def test_sha256sums_match_passes(self): + # No attestation, but the release ships SHA256SUMS and the in- + # tree binary's hash matches. + import hashlib as _hashlib + content = b"runs-on-binary" + digest = _hashlib.sha256(content).hexdigest() + sha256sums = f"{digest} main-linux-amd64\n" + with self._mocked_env( + ["action.yml", "main-linux-amd64"], + tag="v2.1.2", + sha256sums_text=sha256sums, + attest_ok=False, + blob_bytes=content, + ): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert errors == [] + + def test_sha256sums_hash_mismatch_fails(self): + # SHA256SUMS lists the file but with a different hash than what + # the in-tree blob actually contains — the binary was tampered + # with after the release was built. Hard fail. + wrong_digest = "0" * 64 + sha256sums = f"{wrong_digest} main-linux-amd64\n" + with self._mocked_env( + ["action.yml", "main-linux-amd64"], + tag="v2.1.2", + sha256sums_text=sha256sums, + blob_bytes=b"ACTUAL CONTENT", + ): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + assert "1 unverified" in errors[0] + + def test_binary_not_listed_in_sha256sums_fails(self): + # SHA256SUMS exists but doesn't list this binary — that's a + # publisher-side gap (forgot to include it in the checksum + # generation). Treat as unverified. + sha256sums = "deadbeef" + "0" * 56 + " some-other-file\n" + with self._mocked_env( + ["action.yml", "main-linux-amd64"], + tag="v2.1.2", + sha256sums_text=sha256sums, + ): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + + def test_attestation_preferred_over_sha256sums(self): + # When both mechanisms are available, attestation wins. Pass an + # intentionally-broken SHA256SUMS to prove the SHA256SUMS path + # was never taken (otherwise we'd hit the mismatch branch). + with self._mocked_env( + ["action.yml", "main-linux-amd64"], + tag="v2.1.2", + sha256sums_text="0" * 64 + " main-linux-amd64\n", + attest_ok=True, + blob_bytes=b"DIFFERENT FROM SHA256SUMS", + ): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert errors == [] + + def test_blob_fetch_failure_fails_that_binary(self): + # If we can't even fetch the binary's bytes, we can't verify — + # treat as unverified. Other binaries that DO fetch are + # evaluated independently. + with self._mocked_env( + ["action.yml", "main-linux-amd64"], + blob_bytes=None, # _fetch_blob_bytes returns None + ): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + assert "1 unverified" in errors[0] + + def test_normal_node_action_passes(self): + # No binaries detected → nothing to verify, no errors. The + # mocks for verification helpers don't need to match anything. + paths = [ + "action.yml", "package.json", "package-lock.json", + "README.md", "tsconfig.json", + "src/main.ts", "src/util.ts", + "dist/index.js", "dist/index.js.map", "dist/licenses.txt", + ] + with self._mocked_env(paths): + assert analyze_in_tree_binaries("org", "repo", "a" * 40) == [] + + def test_composite_action_passes(self): + paths = ["action.yml", "Dockerfile", "scripts/install.sh"] + with self._mocked_env(paths): + assert analyze_in_tree_binaries("org", "repo", "a" * 40) == [] + + def test_single_exe_at_root_fails(self): + with self._mocked_env(["action.yml", "tool.exe", "index.js"]): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + assert "tool.exe" in errors[0] + + def test_jar_in_lib_directory_fails(self): + with self._mocked_env(["action.yml", "lib/runtime.jar"]): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + assert "lib/runtime.jar" in errors[0] + + def test_sub_path_filters_to_subaction(self): + paths = [ + "other-action/main-linux-amd64", + "my-action/action.yml", + "my-action/index.js", + ] + with self._mocked_env(paths): + assert analyze_in_tree_binaries( + "org", "repo", "a" * 40, sub_path="my-action", + ) == [] + + def test_sub_path_flags_within_subaction(self): + paths = [ + "my-action/action.yml", + "my-action/main-linux-amd64", + ] + with self._mocked_env(paths): + errors = analyze_in_tree_binaries( + "org", "repo", "a" * 40, sub_path="my-action", + ) + assert len(errors) == 1 + assert "main-linux-amd64" in errors[0] + + def test_empty_tree_returns_no_errors(self): + with self._mocked_env([]): + assert analyze_in_tree_binaries("org", "repo", "a" * 40) == [] + + def test_many_binaries_truncates_in_message(self): + paths = ["action.yml"] + [ + f"main-linux-amd64-{i}.exe" for i in range(20) + ] + with self._mocked_env(paths): + errors = analyze_in_tree_binaries("org", "repo", "a" * 40) + assert len(errors) == 1 + msg = errors[0] + assert "20 unverified pre-compiled binary" in msg + assert "17 more" in msg + + +class _Patches: + """Tiny context manager that enters/exits a sequence of mock.patch + objects together — used by TestAnalyzeInTreeBinaries to keep the + cascade of patches readable.""" + + def __init__(self, *patches): + self._patches = patches + self._entered: list = [] + + def __enter__(self): + for p in self._patches: + self._entered.append(p.__enter__()) + return self + + def __exit__(self, *args): + for p in reversed(self._patches): + p.__exit__(*args) diff --git a/utils/verify_action_build/security.py b/utils/verify_action_build/security.py index 080a30cd..ee5c5946 100644 --- a/utils/verify_action_build/security.py +++ b/utils/verify_action_build/security.py @@ -18,9 +18,13 @@ # """Security analysis checks for GitHub Actions.""" +import hashlib import json import os import re +import shutil +import subprocess +import tempfile from dataclasses import dataclass from pathlib import Path from typing import Iterator @@ -1625,3 +1629,335 @@ def analyze_repo_metadata( console.print(f" [dim]ℹ[/dim] Org: {org} (not in well-known list)") return warnings + + +# Extensions that strongly indicate a pre-compiled native binary. An action +# shipping any of these in its tree is running opaque executable code that +# the JS-rebuild check cannot reconcile with source. +_IN_TREE_BINARY_EXTENSIONS = ( + ".exe", ".dll", ".so", ".dylib", ".bin", + ".deb", ".rpm", ".pkg", ".dmg", ".msi", ".appimage", + ".o", ".a", ".lib", ".obj", + ".class", ".jar", ".war", + ".wasm", +) + +# Cross-compiled binary naming convention used by Go, Rust, and similar +# toolchains: ``--`` with an optional ``.exe``. Catches +# the runs-on/action shape (main-linux-amd64, main-windows-amd64.exe). +_PLATFORM_ARCH_BINARY_RE = re.compile( + r"-(?:linux|darwin|windows|freebsd|openbsd|netbsd|aix)" + r"-(?:amd64|arm64|386|i386|arm|armv7|riscv64|x86_64|aarch64|ppc64le|s390x)" + r"(?:\.exe)?$" +) + +# Filename patterns that LOOK binary but are conventional in JS/TS or other +# textual sources — don't false-positive these. +_IN_TREE_BINARY_EXEMPT_NAMES = { + # Webpack's licenses txt that ships in dist/ for some actions. + "licenses.txt", +} + + +def _looks_like_in_tree_binary(path: str) -> bool: + """Return True if ``path`` (a repo-relative blob path) looks like a + pre-compiled native binary by name alone. + + Cheap path-only heuristic — no fetch, no magic-byte sniff. Known + binary extensions and cross-compile platform/arch suffixes both + trigger; conventional text artifacts are exempted. + """ + name = path.rsplit("/", 1)[-1] + if name in _IN_TREE_BINARY_EXEMPT_NAMES: + return False + lower = name.lower() + if lower.endswith(_IN_TREE_BINARY_EXTENSIONS): + return True + if _PLATFORM_ARCH_BINARY_RE.search(lower): + return True + return False + + +def _fetch_blob_bytes(org: str, repo: str, commit_hash: str, path: str) -> bytes | None: + """Fetch a file's *raw bytes* from GitHub at ``commit_hash``. + + Uses raw.githubusercontent.com which serves any file size (the + Contents API caps at 1 MB and the in-tree binaries we want to verify + are several MB). Returns ``None`` on any failure. + """ + url = f"https://raw.githubusercontent.com/{org}/{repo}/{commit_hash}/{path}" + try: + resp = requests.get(url, timeout=30) + if resp.ok: + return resp.content + except requests.RequestException: + pass + return None + + +def _fetch_release_asset_text( + org: str, repo: str, tag: str, asset_name: str, +) -> str | None: + """Fetch a named release asset's text content from a tag. + + Two-hop: first the release lookup gives us the asset's API id; then + we GET the asset URL with ``Accept: application/octet-stream`` to + follow the binary redirect. Used to fetch the small text-format + ``SHA256SUMS`` artifact, not the binaries themselves. + """ + url = f"https://api.github.com/repos/{org}/{repo}/releases/tags/{tag}" + headers = {"Accept": "application/vnd.github+json"} + token = os.environ.get("GITHUB_TOKEN") + if token: + headers["Authorization"] = f"Bearer {token}" + try: + resp = requests.get(url, headers=headers, timeout=15) + if not resp.ok: + return None + for asset in resp.json().get("assets", []): + if asset.get("name") != asset_name: + continue + asset_url = asset.get("url") + if not asset_url: + return None + asset_headers = {**headers, "Accept": "application/octet-stream"} + resp2 = requests.get(asset_url, headers=asset_headers, timeout=30) + if resp2.ok: + return resp2.text + return None + except requests.RequestException: + pass + return None + + +def _parse_sha256sums(content: str) -> dict[str, str]: + """Parse standard ``SHA256SUMS`` format (`` ``) into + ``{filename: sha256_hex_lowercase}``. + + Tolerant of: + * `` *`` (binary-mode marker prefix on the name). + * Comments (``#`` at line start) and blank lines. + * Trailing whitespace. + + Lines that don't parse cleanly (wrong length, non-hex digest) are + silently skipped — the caller treats "unknown filename" as a fail + further down. + """ + result: dict[str, str] = {} + for line in content.splitlines(): + stripped = line.strip() + if not stripped or stripped.startswith("#"): + continue + parts = stripped.split(maxsplit=1) + if len(parts) != 2: + continue + digest = parts[0].lower() + name = parts[1].lstrip("*").strip() + if len(digest) == 64 and all(c in "0123456789abcdef" for c in digest): + result[name] = digest + return result + + +def _resolve_tag_for_commit( + org: str, repo: str, commit_hash: str, +) -> str | None: + """Most-specific tag pointing at ``commit_hash``, or None. + + Wraps the existing ``release_lookup._find_tags_for_commit`` so the + binary-verification path doesn't have to reach into that module's + private API directly. Local import avoids a top-level cycle. + """ + from .release_lookup import _find_tags_for_commit + tags = _find_tags_for_commit(org, repo, commit_hash) + return tags[0] if tags else None + + +def _verify_via_gh_attestation( + org: str, content: bytes, +) -> bool: + """Verify a binary's content has a SLSA attestation on GitHub. + + Saves the bytes to a temp file, calls ``gh attestation verify + --owner `` and returns whether the gh CLI succeeded. Returns + False on any failure, including ``gh`` not installed or the + attestations API returning no match for the artifact's digest. + """ + if not shutil.which("gh"): + return False + fd, tmp_path = tempfile.mkstemp(prefix="verify-action-bin-", suffix=".bin") + try: + with os.fdopen(fd, "wb") as f: + f.write(content) + result = subprocess.run( + ["gh", "attestation", "verify", tmp_path, "--owner", org], + capture_output=True, text=True, timeout=60, + ) + return result.returncode == 0 + except (subprocess.TimeoutExpired, subprocess.SubprocessError, OSError): + return False + finally: + try: + os.unlink(tmp_path) + except OSError: + pass + + +def analyze_in_tree_binaries( + org: str, repo: str, commit_hash: str, sub_path: str = "", +) -> list[str]: + """Flag pre-compiled native binaries shipped in the action's tree + that lack verifiable upstream provenance. + + An action that commits a Go/Rust/C binary alongside ``action.yml`` + and exec's it from a small launcher is running opaque executable + code on the runner that no rebuild check can reconcile with source. + Verifiable provenance closes that gap; without it, the binary is + untrustworthy. + + Each detected binary is verified through this cascade (first hit + wins): + + 1. ``gh attestation verify`` — bytes saved to a temp file and + checked against the GitHub attestation transparency log, + scoped to ``--owner ``. Strongest signal: ties the + artifact's SHA256 digest to the workflow run that produced it. + 2. ``SHA256SUMS`` release asset — tag's release asset is fetched + and parsed, and each binary's filename is matched + its + SHA256 hash compared. Slightly weaker (no signature on the + checksum file itself; that's a future enhancement). + + Binaries that pass either step are reported as ✓. Binaries that + pass neither become hard errors. An action that ships in-tree + binaries with no verification chain at all returns one error per + unverified binary; reviewers should treat that as "reject unless + upstream adds provenance". + + Returns a list of error strings (empty = pass). + """ + errors: list[str] = [] + + paths = _list_repo_files(org, repo, commit_hash) + if not paths: + return errors + + prefix = f"{sub_path.rstrip('/')}/" if sub_path else "" + binaries: list[str] = [] + for path in paths: + if prefix and not path.startswith(prefix): + continue + rel = path[len(prefix):] if prefix else path + if _looks_like_in_tree_binary(rel): + binaries.append(rel) + + if not binaries: + return errors + + console.print() + console.rule("[bold]In-tree Binary Check[/bold]") + + # Look up the tag once — both verification paths key off the release + # for this commit. + tag_name = _resolve_tag_for_commit(org, repo, commit_hash) + sha256sums: dict[str, str] | None = None + if tag_name: + text = _fetch_release_asset_text(org, repo, tag_name, "SHA256SUMS") + if text: + sha256sums = _parse_sha256sums(text) + + verified_attestation: list[str] = [] + verified_sha256sums: list[str] = [] + unverified: list[tuple[str, str]] = [] # (path, reason) + + for binary in binaries: + full_path = f"{prefix}{binary}" + content = _fetch_blob_bytes(org, repo, commit_hash, full_path) + if content is None: + unverified.append((binary, "could not fetch from repo")) + continue + + # 1. Attestation (preferred). + if _verify_via_gh_attestation(org, content): + verified_attestation.append(binary) + continue + + # 2. SHA256SUMS fallback. + if sha256sums is not None: + name = binary.rsplit("/", 1)[-1] + expected = sha256sums.get(name) + if expected is None: + unverified.append(( + binary, + f"not listed in SHA256SUMS at release {tag_name}", + )) + continue + actual = hashlib.sha256(content).hexdigest() + if actual != expected: + unverified.append(( + binary, + f"SHA256 mismatch (expected {expected[:12]}…, got {actual[:12]}…)", + )) + continue + verified_sha256sums.append(binary) + continue + + # Neither mechanism available. + if tag_name: + reason = ( + f"no SLSA attestation and release {tag_name} has no SHA256SUMS" + ) + else: + reason = ( + "no tag found for this commit; cannot fetch release SHA256SUMS" + ) + unverified.append((binary, reason)) + + # Summary block. + if verified_attestation: + console.print( + f" [green]✓[/green] {len(verified_attestation)} binary(ies) " + f"verified via gh attestation (SLSA provenance):" + ) + for path in verified_attestation: + console.print(f" [green]✓[/green] {path}") + if verified_sha256sums: + console.print( + f" [green]✓[/green] {len(verified_sha256sums)} binary(ies) " + f"verified against [bold]{tag_name}[/bold] release SHA256SUMS:" + ) + for path in verified_sha256sums: + console.print(f" [green]✓[/green] {path}") + if unverified: + console.print( + f" [red]✗[/red] {len(unverified)} pre-compiled binary file(s) " + f"could not be verified:" + ) + for path, reason in unverified[:10]: + console.print(f" [red]{path}[/red]: [dim]{reason}[/dim]") + if len(unverified) > 10: + console.print(f" [dim] ... and {len(unverified) - 10} more[/dim]") + + if not unverified: + return errors + + console.print( + " [dim]Action ships opaque executable code without verifiable " + "provenance for[/dim]" + ) + console.print( + " [dim]the listed binaries. Reject unless upstream adds SLSA " + "attestations[/dim]" + ) + console.print( + " [dim](actions/attest-build-provenance) or a SHA256SUMS file at " + "the release.[/dim]" + ) + + sample = ", ".join(p for p, _ in unverified[:3]) + if len(unverified) > 3: + sample += f", ... ({len(unverified) - 3} more)" + errors.append( + f"action ships {len(unverified)} unverified pre-compiled binary " + f"file(s) in-tree ({sample}) — no SLSA attestation and no " + f"matching SHA256SUMS at the release" + ) + return errors diff --git a/utils/verify_action_build/verification.py b/utils/verify_action_build/verification.py index e155b861..0c1a7b7f 100644 --- a/utils/verify_action_build/verification.py +++ b/utils/verify_action_build/verification.py @@ -44,6 +44,7 @@ analyze_binary_downloads_recursive, analyze_dependency_pinning, analyze_dockerfile, + analyze_in_tree_binaries, analyze_lock_files, analyze_nested_actions, analyze_repo_metadata, @@ -188,6 +189,7 @@ def verify_single_action( matched_with_approved_lockfile = False binary_download_failures: list[str] = [] lock_file_errors: list[str] = [] + in_tree_binary_errors: list[str] = [] # Detect source-detached release tags (orphan commits containing only # distributable artifacts) and resolve the default-branch source commit @@ -319,6 +321,25 @@ def verify_single_action( "all detected manifests have lock files (or are exempted)", )) + # In-tree binary check: flag pre-compiled native binaries shipped + # directly in the action's repo. These are opaque executable code + # that the JS-rebuild check cannot reconcile with source — the + # launcher matches, the binary doesn't. See runs-on/action@v2.1.x + # for the canonical case. + in_tree_binary_errors = analyze_in_tree_binaries( + org, repo, commit_hash, sub_path, + ) + if in_tree_binary_errors: + checks_performed.append(( + "In-tree binary check", "fail", + "unverified binaries in repo (no SLSA attestation / SHA256SUMS)", + )) + else: + checks_performed.append(( + "In-tree binary check", "pass", + "no in-tree binaries (or all verified via attestation / SHA256SUMS)", + )) + if not is_js_action: console.print() console.print( @@ -532,7 +553,12 @@ def verify_single_action( ci_mode=ci_mode, ) - overall_passed = all_match and not binary_download_failures and not lock_file_errors + overall_passed = ( + all_match + and not binary_download_failures + and not lock_file_errors + and not in_tree_binary_errors + ) console.print() checklist_hint = f"\n[dim]Security review checklist: {SECURITY_CHECKLIST_URL}[/dim]" @@ -580,6 +606,13 @@ def verify_single_action( f"{len(lock_file_errors)} manifest(s) without a matching lock file " f"(transitive dependencies not pinned; rebuilds cannot be reproduced)[/red bold]" ) + elif in_tree_binary_errors: + fail_msg = ( + f"[red bold]{action_type} action — " + f"{len(in_tree_binary_errors)} unverified pre-compiled binary(ies) " + f"in repo (no SLSA attestation, no matching SHA256SUMS at release)" + f"[/red bold]" + ) else: fail_msg = f"[red bold]{action_type} action — verification failed[/red bold]" console.print(