Skip to content

ci: targeted builds and runner reliability improvements#207

Open
krisrice wants to merge 3 commits into
mainfrom
ci/build-workflow-improvements
Open

ci: targeted builds and runner reliability improvements#207
krisrice wants to merge 3 commits into
mainfrom
ci/build-workflow-improvements

Conversation

@krisrice
Copy link
Copy Markdown
Member

Problem

Every PR triggered a full build across all ~22 MCP servers, regardless of which server was actually changed. A PR touching one server would spin up 22 runners, download ~80 packages each, and run for 60+ seconds — wasteful and slow.

Additionally, the builds were fragile: a single flaky runner (e.g. a PyPI network timeout mid-install) would cascade-cancel all other matrix jobs.

Changes

1. Affected-files-only builds on PRs

get-directories now uses git diff against the PR base SHA to detect which src/<server>/ directories were actually changed, and only runs those matrix jobs.

Scenario Builds triggered
PR touches src/oci-compute-mcp-server/** only oci-compute-mcp-server only
PR touches requirements*.txt, .github/, or Makefile all servers (infra change)
Push to main all servers (full integration check)
PR touches only docs/READMEs at root none (matrix skipped)

2. pip caching (cache: 'pip')

Added to both build and combined-coverage jobs. After the first run, the ~80 shared packages (fastmcp, oci, pydantic, etc.) are served from the runner cache instead of re-downloaded from PyPI on every job.

3. fail-fast: false on the matrix

Previously one flaky runner (e.g. a network timeout) would cascade-cancel all other matrix jobs. Now each job fails independently.

4. timeout-minutes: 15 on the build job

Replaces GitHub's 6-hour default with a hard ceiling appropriate for these builds.

Three runner reliability improvements:
- cache: 'pip' on both build and combined-coverage jobs — prevents 22+
  parallel matrix jobs from cold-downloading the same ~80 packages from
  PyPI simultaneously (thundering herd causing network timeouts)
- timeout-minutes: 15 on build job — gives a clear failure signal instead
  of hanging for GitHub's 6-hour default
- fail-fast: false on matrix strategy — one flaky runner no longer
  cascades and cancels all other matrix jobs
Previously every PR triggered builds for all ~22 servers regardless of
what changed. Now get-directories uses git diff against the PR base SHA
to detect which src/<server>/ directories were actually touched and only
runs those matrix jobs.

Rules:
- PR touching src/oci-compute-mcp-server/* → only builds oci-compute
- PR touching requirements*.txt, .github/, or Makefile → builds all
  (infra change could affect every server)
- Push to main → builds everything (full integration check)

fetch-depth: 0 added to get-directories checkout so git diff has the
full history needed to compare against the base commit.
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 18, 2026
@krisrice krisrice enabled auto-merge (squash) April 18, 2026 14:14
Runs on every push to main and daily at 06:00 UTC. Checks all open
non-draft PRs for merge conflicts and applies or removes the 'needs
rebase' label automatically so conflict status is visible in the PR
list without drilling into each PR.
DIRS="$ALL"
fi

directories=$(echo "$DIRS" | jq -R -s -c 'split("\n") | map(select(length > 0))')
Copy link
Copy Markdown
Member

@gebhardtr gebhardtr Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can emit an empty list; will this break the build if only excluded/docs are updated?

@krisrice
Copy link
Copy Markdown
Member Author

Review status: not approvable yet.

Blocking issue:

  • .github/workflows/label-conflicts.yml sets job permissions to only pull-requests: write but also runs actions/checkout@v4. In GitHub Actions, once permissions is declared, unspecified scopes are unavailable; checkout needs contents: read unless the checkout step is removed.

Steps to make it approvable:

  1. Either add contents: read under the job permissions or remove the checkout step if the script does not need a workspace.
  2. Re-run the workflow checks and confirm they pass.
  3. Confirm the conflict-label workflow still has only the minimum permissions it needs: pull-requests: write for PR mergeability and labels, plus contents: read only if checkout remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants