fix(runners): wire job_retry.lambda_memory_size and lambda_timeout by oscarbc96 · Pull Request #5120 · github-aws-runners/terraform-aws-github-runner

oscarbc96 · 2026-05-10T10:43:25Z

Description

Both var.job_retry (in modules/multi-runner/variables.tf and modules/runners/variables.tf) declare lambda_memory_size and lambda_timeout as documented configuration fields, but local.job_retry in modules/runners/job-retry.tf never copies either field into the config map passed to the inner job-retry / lambda sub-modules. The inner lambda module then falls back to its defaults (memory_size = 256, timeout = 60), so user-supplied values are silently dropped — tofu plan shows no diff and the running Lambda keeps its defaults.

The fix is a two-line addition to the local.job_retry map. It mirrors the pattern modules/runners/ssm-housekeeper.tf already uses for local.ssm_housekeeper.lambda_memory_size / local.ssm_housekeeper.lambda_timeout — that Lambda correctly threads the values through.

Motivation

Discovered in production: I pinned lambda_memory_size = 512 in multi_runner_config[*].runner_config.job_retry after observing the job-retry Lambdas at 87% memory utilisation (223 MB peak on the 256 MB default), and got No changes from tofu plan. Tracing the wiring confirmed the value never reaches the resource.

Reproduction

module "runners" {
  source  = "github-aws-runners/github-runner/aws//modules/multi-runner"
  version = "7.6.0"
  # …
  multi_runner_config = {
    "example" = {
      matcherConfig = { … }
      runner_config = merge(local.default_config, {
        # … other config …
        job_retry = {
          enable             = true
          lambda_memory_size = 512  # ← silently ignored before this PR
          lambda_timeout     = 60   # ← silently ignored before this PR
        }
      })
    }
  }
}

After this fix, tofu plan shows the expected memory_size: 256 -> 512 change on the job-retry Lambda.

Verification

tofu fmt clean.
The variable type definition on both modules/runners/variables.tf and modules/multi-runner/variables.tf already declares lambda_memory_size = optional(number, 256) and lambda_timeout = optional(number, 30), so no public surface changes.
The inner modules/runners/modules/lambda accepts memory_size and timeout on its lambda input object (with the same defaults), so when the wiring is restored the values flow through naturally.

No-impact when not set

Defaults remain memory_size = 256 (per the variable declaration) and timeout = 30 — same as today's effective values when nothing is overridden.

The job_retry variable on both the multi-runner and runners modules declares lambda_memory_size and lambda_timeout, but the local.job_retry map in modules/runners/job-retry.tf never copied either field into the config passed to the inner job-retry / lambda sub-modules. The inner lambda module fell back to its defaults (memory_size = 256, timeout = 60), so user-supplied values were silently dropped. Mirrors the pattern already used by ssm-housekeeper.tf (local.ssm_housekeeper.lambda_memory_size / local.ssm_housekeeper.lambda_timeout) — the ssm-housekeeper Lambda correctly threads the values through; the job-retry one didn't. Observed in production: a deployment pinned to lambda_memory_size = 512 in multi_runner_config[*].runner_config.job_retry produced no plan diff because the value never reached the resource. The job-retry Lambdas were OOM-adjacent at 87% memory utilisation (223 MB peak on the 256 MB default) on a fleet of three runners.

Copilot

Pull request overview

This PR fixes configuration wiring in the modules/runners Terraform module so that user-provided job_retry.lambda_memory_size and job_retry.lambda_timeout are actually passed into the internal job-retry/Lambda submodule rather than being silently dropped.

Changes:

Thread var.job_retry.lambda_memory_size through as memory_size in local.job_retry.
Thread var.job_retry.lambda_timeout through as timeout in local.job_retry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Brend-Smits

Looks good, thanks for the fix!

🤖 I have created a release *beep* *boop* --- ## [7.7.0](v7.6.1...v7.7.0) (2026-06-11) ### Features * Add feature to enable dynamic ec2 config via workflow labels ([#5003](#5003)) ([c68445d](c68445d)) * add support for macos runners ([#4930](#4930)) ([3e179a3](3e179a3)) * Introduce Amazon Linux 2023 ARM image ([#4780](#4780)) ([e572ae5](e572ae5)) * relax cpu_options schema and add amd_sev_snp + nested_virtualization support ([#5039](#5039)) ([5a3746d](5a3746d)) * **runner-role:** Enable using separate IAM role for runners ([#4875](#4875)) ([6642e57](6642e57)) ### Bug Fixes * **ci:** sign auto-generated docs commits ([#5154](#5154)) ([a6af4d2](a6af4d2)) * **runners:** wire job_retry.lambda_memory_size and lambda_timeout ([#5120](#5120)) ([404785e](404785e)) * **scale-up:** Add ec2:TerminateInstances permission to scale-up Lambda IAM policy ([#5152](#5152)) ([94c4e12](94c4e12)) * **scale-up:** prevent negative TotalTargetCapacity when runners exceed maximum ([#5062](#5062)) ([9ab7410](9ab7410)) * **webhook:** Fix publish events to EventBridge ([#5143](#5143)) ([a72b737](a72b737)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: runners-releaser[bot] <194412594+runners-releaser[bot]@users.noreply.github.com>

oscarbc96 requested a review from a team as a code owner May 10, 2026 10:43

Brend-Smits requested a review from Copilot June 11, 2026 07:36

Merge branch 'main' into fix/job-retry-lambda-memory-and-timeout

d00cef9

Copilot started reviewing on behalf of Brend-Smits June 11, 2026 07:36 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread modules/runners/job-retry.tf

Brend-Smits approved these changes Jun 11, 2026

View reviewed changes

Brend-Smits merged commit 404785e into github-aws-runners:main Jun 11, 2026
41 checks passed

runners-releaser Bot mentioned this pull request Jun 11, 2026

chore(main): release 7.7.0 #5151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runners): wire job_retry.lambda_memory_size and lambda_timeout#5120

fix(runners): wire job_retry.lambda_memory_size and lambda_timeout#5120
Brend-Smits merged 2 commits into
github-aws-runners:mainfrom
oscarbc96:fix/job-retry-lambda-memory-and-timeout

oscarbc96 commented May 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Brend-Smits left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oscarbc96 commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Reproduction

Verification

No-impact when not set

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Brend-Smits left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oscarbc96 commented May 10, 2026 •

edited

Loading