Skip to content

feat(runners): support OnDemandOptions.AllocationStrategy in EC2 Fleet#5077

Open
piotrj wants to merge 10 commits into
github-aws-runners:mainfrom
ClipboardHealth:feat/on-demand-allocation-strategy
Open

feat(runners): support OnDemandOptions.AllocationStrategy in EC2 Fleet#5077
piotrj wants to merge 10 commits into
github-aws-runners:mainfrom
ClipboardHealth:feat/on-demand-allocation-strategy

Conversation

@piotrj

@piotrj piotrj commented Mar 24, 2026

Copy link
Copy Markdown

Summary

Currently, createInstances always sets SpotOptions.AllocationStrategy regardless of the fleet's targetCapacityType. For on-demand fleets, AWS silently ignores SpotOptions and defaults to lowest-price, making the instance_allocation_strategy variable and instance_types ordering meaningless for on-demand users who want prioritized.

This PR fixes that by conditionally setting SpotOptions or OnDemandOptions based on targetCapacityType, so the allocation strategy is correctly applied for both spot and on-demand fleets.

Why instance_type_priorities?

The priority-aware allocation strategies — prioritized (on-demand) and capacity-optimized-prioritized (spot) — don't simply use the array order of launch template overrides. They rely on an explicit Priority field on each override entry. Without setting Priority, all overrides have equal (lowest) priority and these strategies effectively ignore the intended instance-type ordering.

We introduced the optional instance_type_priorities variable (map(number), e.g. { "m5.large" = 1, "c5.large" = 2 }) as a non-breaking addition to give users explicit control over the priority assigned to each instance type. When not provided, priorities automatically fall back to the index position in instance_types, so existing configurations continue to work unchanged — the first instance type in the list gets the highest priority (0), the second gets 1, and so on. This makes the priority-aware strategies work correctly out of the box while still allowing fine-grained control when needed.

Note: The Priority field is only set on fleet overrides when instance_allocation_strategy is "prioritized" (on-demand) or "capacity-optimized-prioritized" (spot). For all other allocation strategies, overrides remain unchanged — no Priority is added. This ensures that existing setups using strategies like capacity-optimized or lowest-price are completely unaffected by this change.

What changed

Conditional SpotOptions / OnDemandOptions

  • The createInstances function now checks targetCapacityType and sets SpotOptions (with MaxTotalPrice and AllocationStrategy) for spot fleets, or OnDemandOptions (with AllocationStrategy) for on-demand fleets.
  • "prioritized" is added to the instance_allocation_strategy validation list in all Terraform variable definitions.

Allocation strategy sanitization

  • The instance_allocation_strategy variable accepts the union of spot and on-demand strategies, so a value valid for one capacity type can be invalid for the other. Before building the CreateFleet request, the strategy is now sanitized against the target capacity type and falls back to lowest-price (the AWS default) when invalid — preventing invalid values from reaching SpotOptions/OnDemandOptions and preserving backwards compatibility for on-demand users whose spot-only strategy was previously ignored.
  • The same sanitization is reused when falling back from spot to on-demand.

Override priorities

  • generateFleetOverrides sets Priority on each override for the priority-aware strategies — prioritized (on-demand) and capacity-optimized-prioritized (spot). It uses instance_type_priorities if provided, or the index in instance_types otherwise. All other strategies leave overrides unchanged.
  • The new instance_type_priorities variable is threaded through the full chain: root Terraform variables → runners module → Lambda environment variables (INSTANCE_TYPE_PRIORITIES as JSON) → ec2instanceCriteriagenerateFleetOverrides.

@piotrj piotrj requested review from a team as code owners March 24, 2026 16:02
@piotrj piotrj force-pushed the feat/on-demand-allocation-strategy branch 4 times, most recently from 25c0704 to 73c147f Compare March 24, 2026 18:57
@piotrj piotrj force-pushed the feat/on-demand-allocation-strategy branch from 73c147f to 17fc960 Compare April 20, 2026 10:40
@piotrj

piotrj commented Apr 20, 2026

Copy link
Copy Markdown
Author

Rebased the PR to resolve conflicts.
For what it's worth, we've been running this implementation in our organization without issues.

@Brend-Smits

Copy link
Copy Markdown
Contributor

Hey @piotrj

This looks like a great change! I apologize for the delay in getting this reviewed/merged. We're not cleaning up the backlog of PR's and will make sure to get faster review cycles in.

Could you please make sure the conflicts are resolved in your branch? I will review afterwards!
Thank you for taking the time to contribute this. Great work!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the runner scale-up implementation to correctly apply EC2 Fleet allocation strategy settings for both Spot and On-Demand fleets, and introduces optional per–instance-type priority controls to make the prioritized strategy behave as intended.

Changes:

  • Conditionally set SpotOptions vs OnDemandOptions in the EC2 CreateFleet request based on targetCapacityType.
  • Add and plumb through instance_type_priorities (Terraform → Lambda env var → control-plane criteria → fleet overrides).
  • Extend control-plane tests to cover on-demand allocation strategies and override priorities.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
variables.tf Adds prioritized to allocation strategy validation and introduces instance_type_priorities at root.
main.tf Wires instance_type_priorities into the runners module.
modules/runners/variables.tf Adds module-level instance_type_priorities and expands allocation strategy validation.
modules/runners/scale-up.tf Exposes INSTANCE_TYPE_PRIORITIES env var to the scale-up Lambda.
modules/runners/pool.tf Threads instance_type_priorities into the pool submodule config.
modules/runners/pool/variables.tf Extends pool config schema to include optional instance_type_priorities.
modules/runners/pool/main.tf Exposes INSTANCE_TYPE_PRIORITIES env var to the pool Lambda.
modules/multi-runner/variables.tf Adds instance_type_priorities option to multi-runner configuration and documents it.
modules/multi-runner/runners.tf Passes instance_type_priorities from multi-runner config into the runners module.
lambdas/functions/control-plane/src/scale-runners/scale-up.ts Parses INSTANCE_TYPE_PRIORITIES and forwards it into runner creation criteria.
lambdas/functions/control-plane/src/pool/pool.ts Parses INSTANCE_TYPE_PRIORITIES and forwards it into runner creation criteria.
lambdas/functions/control-plane/src/aws/runners.ts Implements override priority injection and conditional Spot vs On-Demand fleet options.
lambdas/functions/control-plane/src/aws/runners.test.ts Adds/updates tests for on-demand strategies and per-type priority handling.
lambdas/functions/control-plane/src/aws/runners.d.ts Updates runner input typings to allow on-demand allocation strategies and priorities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lambdas/functions/control-plane/src/aws/runners.ts Outdated
Comment thread modules/runners/variables.tf Outdated
Comment thread modules/multi-runner/variables.tf Outdated
@piotrj piotrj force-pushed the feat/on-demand-allocation-strategy branch from 5610936 to 17fc960 Compare June 11, 2026 08:15
piotrj added 5 commits June 11, 2026 10:19
Previously, createInstances always set SpotOptions.AllocationStrategy
even for on-demand fleets, making instance_allocation_strategy and
instance_types ordering meaningless for on-demand users wanting
`prioritized`. Now the fleet request conditionally sets SpotOptions
or OnDemandOptions based on targetCapacityType, and the spot-to-
on-demand failover path maps spot-only strategies to `lowest-price`.
Add optional `instance_type_priorities` variable (map of instance type
to priority number) to control EC2 Fleet override priorities. When not
set, priorities default to the index position in `instance_types`,
preserving the user's ordering. This makes the `prioritized` allocation
strategy work correctly for both spot and on-demand fleets.
Move the prioritized strategy check into generateFleetOverrides itself
rather than having the caller decide what to pass, making the logic
more explicit and self-contained.
…d priority

Spread ec2OverrideConfig after the computed prioritized Priority so an
explicitly provided override value (e.g. Priority from a ghr-ec2-* label)
wins when both are present.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@piotrj piotrj force-pushed the feat/on-demand-allocation-strategy branch from 17fc960 to d81b270 Compare June 11, 2026 09:28
piotrj and others added 4 commits June 11, 2026 11:42
The instance_allocation_strategy variable accepts the union of spot and
on-demand strategies, so a value valid for one capacity type can be invalid
for the other. Routing it unsanitized into SpotOptions/OnDemandOptions could
send invalid values to AWS (e.g. capacity-optimized for on-demand, prioritized
for spot) and fail CreateFleet, and broke backwards compatibility for on-demand
users whose spot-only strategy was previously ignored.

Add sanitizeAllocationStrategy() that falls back to lowest-price (the AWS
default) when the configured strategy is invalid for the target capacity type,
and reuse it in the on-demand failover path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AWS now recommends price-capacity-optimized for spot fleets. Update the
instance_allocation_strategy descriptions in the runners and multi-runner
modules to match the root variable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prioritized on-demand allocation strategy is now an allowed value for the
instance_allocation_strategy variable and is wired into the lambda via the
INSTANCE_ALLOCATION_STRATEGY env var. Add it to the declared ProcessEnv union
so the type matches the values reachable at runtime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…spot fleets

The spot capacity-optimized-prioritized strategy honors the Priority field of
launch template overrides (best-effort), just like the on-demand prioritized
strategy. Apply instance_type_priorities for both strategies instead of only
prioritized, and update the instance_type_priorities variable descriptions
accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@piotrj

piotrj commented Jun 11, 2026

Copy link
Copy Markdown
Author

@Brend-Smits rebased the PR so the conflicts are gone.

I let the ec2OverrideConfig from #5003 be the last item in the creation of CreateFleetCommand so that in case there is Priority set there then it will be the one that stands. But I still think that priority setting through dynamic labels is not very useful.

Also, I expanded the priorities to spot's capacity-optimized-prioritized allocation strategy. Before, even if someone used that allocation strategy it wouldn't really work cause there were no priorities set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants