Skip to content

CI: split verify-pr.yaml into per-stack workflows with path filters #301

Description

@WhatIfWeDigDeeper

Problem

.github/workflows/verify-pr.yaml is a single workflow that runs every step on every PR. Two pain points:

  1. Postgres service container boots on every PR — required by test:rails-api (added in feat: add Ruby on Rails API #300) and the other DB-backed stacks (spring-api, fastapi, go-api). Docs-only PRs short-circuit via the docs-check step today, but a PR that only touches, say, angular-ui still spins up Postgres for nothing.
  2. One failure blocks everything — a flake in test:spring-api (Gradle/JVM) fails the whole verify-pr check even on a Vue-only PR.

Reference for the path-filtered split pattern: agent-skills/.github/workflows/test-learn-skill.yml.

Proposed shape — three layers

  1. verify-cross-cutting.yaml — always runs, no path filter. Handles things that must fail-fast on every PR regardless of what changed:

    • audit:ci:all (vulnerabilities can land at any time as the advisory DB updates)
    • cspell
    • lint of root configs
    • proto:breaking check
    • No DB, no language toolchain installs.
    • This is the only check listed as required in branch protection.
  2. verify-stack-<name>.yaml — one per stack, with paths: filters covering the stack dir and the shared trigger set:

    • <stack-dir>/**
    • package.json, package-lock.json
    • scripts/**
    • tests/api/**, tests/e2e/**
    • .github/workflows/verify-stack-<name>.yaml (so editing the workflow re-triggers it)
    • Only the four DB-backed stacks (rails-api, spring-api, fastapi, go-api) declare the Postgres services: block.
    • Each per-stack workflow installs only its own toolchain (Ruby for rails, Java for spring, uv for fastapi, Go for go-api, Node for everything else).
  3. No branch-protection requirement on per-stack jobs — they fail visibly in the PR check list and we treat them as blocking by convention. Required + path-filtered is the trap: the check never reports for skipped PRs, so the PR can't merge. The "always-pass sentinel" workaround exists but adds yet more YAML.

Cross-cutting risk worth eyes-open accepting

audit:ci:all lives in cross-cutting and runs every time → good (vuln fail-fast). But per-stack lint/build/test only fire when their stack's paths change. If someone edits tests/api/helpers.ts and forgets to add that path to every verify-stack-*.yaml's paths:, half the stacks won't validate the change.

Mitigations to choose from:

  • A copy-pasted paths: block in every per-stack workflow (verbose but greppable).
  • Generate the workflows from a template (Make/script + .j2 or similar).
  • Periodic full-matrix run via workflow_dispatch or a nightly cron, just as a safety net.

Suggested scope for the first PR

To keep blast radius small, start with just the four DB-backed stacks:

  • Add verify-stack-rails.yaml, verify-stack-spring.yaml, verify-stack-fastapi.yaml, verify-stack-go.yaml.
  • Move the Postgres services: block, the language toolchain setup, and the relevant npm run build/lint/test:<stack> invocations out of verify-pr.yaml.
  • Leave the remaining stacks (UI suites, koa-api, hono-api, nuxt-api, nest-api, nest-history-api, lambda-api, yoga-api) in verify-pr.yaml for now under the existing docs-check short-circuit.
  • That captures most of the "Docker on every PR" savings without producing 12 new YAML files to maintain.

A follow-up PR can split the remaining stacks once we've validated the path-filter approach in practice.

Acceptance criteria

  • A docs-only PR no longer triggers Postgres container startup or Java/Ruby/Python/Go toolchain installs.
  • A PR that only touches angular-ui/ does not run the Spring or Rails build/test.
  • A PR that touches tests/api/helpers.ts triggers every DB-backed stack's tests.
  • audit:ci:all still runs on every PR.
  • Branch protection still blocks merges on cross-cutting failures.
  • Documentation in CLAUDE.md updated to describe the per-stack workflow pattern, including the "remember to add new shared paths to every workflow" reminder.

Context

Originally raised during #300 review — the rails-api PR added a Postgres service container to verify-pr.yaml to support RSpec, surfacing the cost of running it on every PR. Decision was to keep #300 focused on adding the stack and split the workflow restructure into its own PR (this issue).

Cleanup opportunity: stale workflow-level env vars

The top-level env: block in verify-pr.yaml carries two Spring-only settings that are no longer reachable from any step the workflow runs:

env:
  NVD_API_KEY: ${{ secrets.NVD_API_KEY }}
  # OWASP dependency-check requires additional heap on CI runners.
  GRADLE_OPTS: -Xmx4g -XX:MaxMetaspaceSize=1g -Dorg.gradle.daemon=false
  • audit:ci:spring-api (the OWASP dependencyCheckAnalyze invocation that needs both) was removed from audit:ci:all (package.json:30) — it now runs out-of-band.
  • audit:ci:spring-api already defines its own NVD_API_KEY fallback (grep .env) and its own -Dorg.gradle.jvmargs inline, so even when it does run it doesn't depend on the workflow-level env.
  • Spring's test/build steps don't need the 4 GB heap — that boost exists specifically for the OWASP NVD scan.

When verify-stack-spring.yaml is created, drop the top-level env: block from verify-pr.yaml. If Spring's audit ever returns to the always-run path, give the new workflow its own scoped env: block rather than putting it back on the cross-cutting workflow.

Cleanup opportunity 2: duplicate Ruby bundle install

verify-pr.yaml runs Ruby's bundle install twice on every PR:

  1. ruby/setup-ruby@v1 with bundler-cache: true runs bundle install after restoring the vendor/bundle cache.
  2. npm run ci:all later runs ci:rails-api, which is bin/with-ruby bundle install --jobs 4 --retry 3.

The second run is fast on a cache hit (~2s) but still redundant; on a cache miss it duplicates the full install (~30s). It exists because the workflow runs the Security Audit step (audit:ci:all) before the Install npm packages step, and audit:ci:rails-api invokes bundle exec bundler-audit, which requires gems to already be installed — so bundler-cache: true is the only reason the audit step works at all.

Two fix paths in the per-stack split:

  • Option A — drop bundler-cache: true and reorder: run Install before Audit. ci:rails-api becomes the single bundle-install path, consistent with how ci:fastapi, ci:go, etc. install before audit. Loses the GitHub Actions cache layer for Ruby gems; ~25s slower per CI run.
  • Option B — keep bundler-cache: true for the cache, change ci:rails-api to bundle check (no-op verify) for CI consistency. Diverges from sibling ci:* semantics (ci:fastapi = uv sync, ci:react-ui = npm ci, etc.) which all install rather than verify.

Lean toward A when the per-stack split lands — the cache cost is modest and the convention parity is worth more than the saved seconds. Either way the duplication should not survive into the new verify-stack-rails.yaml.

Surfaced as a Copilot review comment on #300; declined there in favor of folding into this restructure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions