Skip to content

fix(scale-up): Add ec2:TerminateInstances permission to scale-up Lambda IAM policy#5152

Merged
edersonbrilhante merged 6 commits into
mainfrom
copilot/scale-up-lambda-fix-termination-permission
Jun 11, 2026
Merged

fix(scale-up): Add ec2:TerminateInstances permission to scale-up Lambda IAM policy#5152
edersonbrilhante merged 6 commits into
mainfrom
copilot/scale-up-lambda-fix-termination-permission

Conversation

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Description

The scale-up Lambda calls terminateRunner(instanceId) when JIT configuration fails, but lacks the ec2:TerminateInstances IAM permission. This leaves orphaned EC2 instances running when JIT setup errors occur.

Adds ec2:TerminateInstances to modules/runners/policies/lambda-scale-up.json, scoped with two condition statements matching the pattern used in the scale-down Lambda policy:

  1. Scoped by ghr:Application tag (hardcoded value applied at instance creation):
{
    "Effect": "Allow",
    "Action": ["ec2:TerminateInstances"],
    "Resource": ["*"],
    "Condition": {
        "StringEquals": {
            "ec2:ResourceTag/ghr:Application": "github-action-runner"
        }
    }
}
  1. Scoped by gh:environment tag (environment-specific):
{
    "Effect": "Allow",
    "Action": ["ec2:TerminateInstances"],
    "Resource": ["*"],
    "Condition": {
        "StringEquals": {
            "ec2:ResourceTag/gh:environment": "${environment}"
        }
    }
}

Also passes the environment variable (var.prefix) to the policy template in scale-up.tf.

Test Plan

  • Verified the policy JSON is valid and follows the existing conditional scoping pattern from lambda-scale-down.json
  • Both tag-based conditions (ghr:Application and gh:environment) match the scale-down policy exactly
  • The environment template variable is passed as var.prefix, consistent with how scale-down passes it

Related Issues

Copilot AI changed the title [WIP] Fix scale-up Lambda missing ec2:TerminateInstances permission Add ec2:TerminateInstances permission to scale-up Lambda IAM policy Jun 10, 2026
Copilot AI requested a review from Brend-Smits June 10, 2026 13:05
@Brend-Smits Brend-Smits changed the title Add ec2:TerminateInstances permission to scale-up Lambda IAM policy fix(iam-policy): Add ec2:TerminateInstances permission to scale-up Lambda IAM policy Jun 10, 2026
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@Brend-Smits Brend-Smits changed the title fix(iam-policy): Add ec2:TerminateInstances permission to scale-up Lambda IAM policy fix(scale-up): Add ec2:TerminateInstances permission to scale-up Lambda IAM policy Jun 10, 2026
@edersonbrilhante

edersonbrilhante commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@copilot @Brend-Smits it is missing

    {
      "Effect": "Allow",
      "Action": [
        "ec2:TerminateInstances",
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "ec2:ResourceTag/gh:environment": "${environment}"
        }
      }
    }

@Brend-Smits

Copy link
Copy Markdown
Contributor

@copilot have a look at the comment from #5152 (comment) and fix the problem.

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@copilot have a look at the comment from #5152 (comment) and fix the problem.

Added the missing ec2:TerminateInstances statement scoped by ec2:ResourceTag/gh:environment and passed the environment variable (var.prefix) to the template — matching the scale-down policy pattern exactly.

@Brend-Smits Brend-Smits marked this pull request as ready for review June 10, 2026 13:14
@Brend-Smits Brend-Smits requested a review from a team as a code owner June 10, 2026 13:14
@Brend-Smits

Copy link
Copy Markdown
Contributor

PR looks decent. @edersonbrilhante, what do you think?

@edersonbrilhante

Copy link
Copy Markdown
Contributor

Yeap. Now should be fine.

@edersonbrilhante

Copy link
Copy Markdown
Contributor

@copilot fix the conflicts

@Brend-Smits

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

Resolved the merge conflict in modules/runners/scale-up.tf — kept the updated arn_runner_instance_role from main while preserving the environment variable addition.

@Brend-Smits Brend-Smits enabled auto-merge (squash) June 10, 2026 15:42
@github-actions github-actions Bot requested a review from a team as a code owner June 10, 2026 15:42
@edersonbrilhante edersonbrilhante enabled auto-merge (squash) June 10, 2026 17:33
Brend-Smits pushed a commit that referenced this pull request Jun 11, 2026
The `update-docs` workflow creates docs commits with a hand-set bot
identity (`github-aws-runners-pr|bot`) while pushing with
`GITHUB_TOKEN`. The resulting commits are unsigned, show as unverified,
and don't resolve to any GitHub account (see
[dd5f00c](dd5f00c)
on #5152 — `verification.reason: unsigned`, no linked author).

Key changes:

- The terraform-docs step no longer commits or pushes (`git-push:
false`); a new step creates the commit through the GraphQL
`createCommitOnBranch` mutation instead. Commits created via the API are
signed by GitHub and show as **Verified**, attributed to
`github-actions[bot]`. This avoids adding a new third-party action
dependency.
- The main-branch PR path enables `sign-commits: true` on
`peter-evans/create-pull-request`, which also creates commits via the
API for the same effect.
- The fork path is unchanged, since forks push to their own
repositories.

Notes:

- Pushing via `createCommitOnBranch` with `GITHUB_TOKEN` does not
trigger new workflow runs, matching the previous `git push` behavior (no
recursion).
- The `mkdocs gh-deploy` commits to `gh-pages` remain unsigned; mkdocs
commits locally via git, so it can't use the API. Its identity
(`github-actions[bot]`) at least matches the token used.

Validated with actionlint and zizmor (no findings on this workflow).

## Testing

Tested end-to-end on a throwaway branch
([`gc/test/verified-docs-commits`](https://github.com/github-aws-runners/terraform-aws-github-runner/tree/gc/test/verified-docs-commits))
containing this workflow plus a deliberately stale doc (a tweaked
variable description in `modules/webhook/variables.tf` without
regenerating the READMEs):

- **No-change path**: the run on this PR's branch exits cleanly with `No
documentation changes to commit.` ([run
27298488008](https://github.com/github-aws-runners/terraform-aws-github-runner/actions/runs/27298488008))
- **Commit path**: the run on the test branch regenerated 4 READMEs and
created the commit via the API ([run
27303747943](https://github.com/github-aws-runners/terraform-aws-github-runner/actions/runs/27303747943))
— see
[5116de7](5116de7),
which shows as **Verified**:

```console
$ gh api repos/github-aws-runners/terraform-aws-github-runner/commits/5116de7ddf1a45c52ec2b5ce2b702d7f0dbcf5d2 \
    --jq '{author: .commit.author, committer_login: .committer.login, verified: .commit.verification.verified, reason: .commit.verification.reason}'
{
  "author": {
    "date": "2026-06-10T20:19:54Z",
    "email": "41898282+github-actions[bot]@users.noreply.github.com",
    "name": "github-actions[bot]"
  },
  "committer_login": "web-flow",
  "verified": true,
  "reason": "valid"
}
```

Compare with the unverified commit that motivated this PR: `dd5f00c` has
`"verified": false, "reason": "unsigned"` and no resolvable author
account.

Testing also surfaced two fixes now included here: terraform-docs stages
the files it updates, so the change detection diffs against `HEAD`; and
the GraphQL payload is passed to `jq` via a temp file (`--slurpfile`)
because embedding file contents in a CLI argument exceeded the kernel's
per-argument size limit.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@Brend-Smits Brend-Smits force-pushed the copilot/scale-up-lambda-fix-termination-permission branch from 958b562 to 6c92ebb Compare June 11, 2026 07:29
@edersonbrilhante edersonbrilhante merged commit 94c4e12 into main Jun 11, 2026
40 of 41 checks passed
@edersonbrilhante edersonbrilhante deleted the copilot/scale-up-lambda-fix-termination-permission branch June 11, 2026 07:29
Brend-Smits pushed a commit that referenced this pull request Jun 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[7.7.0](v7.6.1...v7.7.0)
(2026-06-11)


### Features

* Add feature to enable dynamic ec2 config via workflow labels
([#5003](#5003))
([c68445d](c68445d))
* add support for macos runners
([#4930](#4930))
([3e179a3](3e179a3))
* Introduce Amazon Linux 2023 ARM image
([#4780](#4780))
([e572ae5](e572ae5))
* relax cpu_options schema and add amd_sev_snp + nested_virtualization
support
([#5039](#5039))
([5a3746d](5a3746d))
* **runner-role:** Enable using separate IAM role for runners
([#4875](#4875))
([6642e57](6642e57))


### Bug Fixes

* **ci:** sign auto-generated docs commits
([#5154](#5154))
([a6af4d2](a6af4d2))
* **runners:** wire job_retry.lambda_memory_size and lambda_timeout
([#5120](#5120))
([404785e](404785e))
* **scale-up:** Add ec2:TerminateInstances permission to scale-up Lambda
IAM policy
([#5152](#5152))
([94c4e12](94c4e12))
* **scale-up:** prevent negative TotalTargetCapacity when runners exceed
maximum
([#5062](#5062))
([9ab7410](9ab7410))
* **webhook:** Fix publish events to EventBridge
([#5143](#5143))
([a72b737](a72b737))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: runners-releaser[bot] <194412594+runners-releaser[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scale-up Lambda lacks ec2:TerminateInstances permission for JIT error cleanup

3 participants