feat: Add feature to enable dynamic ec2 config via workflow labels#5003
Conversation
|
@edersonbrilhante great to see this PR. |
5dab5e4 to
afdfb36
Compare
|
This is a really interesting feature! Just one suggestion from my side: would it be possible to support a whitelist of allowed instance types? Also, it could be really powerful to have some kind of feature-flag / policy control over which parts of the configuration are allowed to be dynamic. For example, in my org we don’t want developers to be able to select arbitrary AMIs (only a pre-approved set), but it would be awesome to still allow them to choose the instance type for workflow jobs, as long as it’s constrained to an allowed list. Maybe the "feature flag" is not even necessary as long as we could define the "allowed values" for each configuration, with this we could list only the pre-approved AMIs. |
|
@andrecastro I liked and makes a lot of sense to me. I just need more time to think about the implementation. And tbis PR is already big enough XD. I could create a following for adding this restricted values feature |
stuartp44
left a comment
There was a problem hiding this comment.
I am happy to approve, but I do have a statement about incorrect labels and the effect on the process.
|
I also agree with what was previously mentioned; we probably need a safelist, as we don't want lateral movement when a compromised pipeline is used, especially with the VPC setting. Maybe worth some "allowed_instance_type" setting or something to that effect that can be checked against, and if not in the list, ignored. |
afdfb36 to
0e4f870
Compare
|
@stuartp44 We can add the safelist, but I think we need a deeper discussion about the implementation |
aae08eb to
750a9ff
Compare
ad2f612 to
e67a37e
Compare
|
@edersonbrilhante tried to test the feature but so far not got it wokring Used labels: |
|
Can you print the logs from dispatch-to-runner? Check if the labels were accepted or not. |
npalm
left a comment
There was a problem hiding this comment.
Great feature. Maybe we can add a small saftety net in the disspachter. A clear warning in the docs. And refer in the variable where users can enable to the risk in the docs.
|
@npalm can you review |
Co-authored-by: Niek Palm <npalm@users.noreply.github.com>
eca2581 to
e6e19ce
Compare
e6e19ce to
f987e00
Compare
Brend-Smits
left a comment
There was a problem hiding this comment.
LGTM, tested it and it seems to work. Documentation looking great.
Thanks a lot!
Let's improve it from here by adding allow/deny lists 👍🏼
🤖 I have created a release *beep* *boop* --- ## [7.7.0](v7.6.1...v7.7.0) (2026-06-11) ### Features * Add feature to enable dynamic ec2 config via workflow labels ([#5003](#5003)) ([c68445d](c68445d)) * add support for macos runners ([#4930](#4930)) ([3e179a3](3e179a3)) * Introduce Amazon Linux 2023 ARM image ([#4780](#4780)) ([e572ae5](e572ae5)) * relax cpu_options schema and add amd_sev_snp + nested_virtualization support ([#5039](#5039)) ([5a3746d](5a3746d)) * **runner-role:** Enable using separate IAM role for runners ([#4875](#4875)) ([6642e57](6642e57)) ### Bug Fixes * **ci:** sign auto-generated docs commits ([#5154](#5154)) ([a6af4d2](a6af4d2)) * **runners:** wire job_retry.lambda_memory_size and lambda_timeout ([#5120](#5120)) ([404785e](404785e)) * **scale-up:** Add ec2:TerminateInstances permission to scale-up Lambda IAM policy ([#5152](#5152)) ([94c4e12](94c4e12)) * **scale-up:** prevent negative TotalTargetCapacity when runners exceed maximum ([#5062](#5062)) ([9ab7410](9ab7410)) * **webhook:** Fix publish events to EventBridge ([#5143](#5143)) ([a72b737](a72b737)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: runners-releaser[bot] <194412594+runners-releaser[bot]@users.noreply.github.com>
|
@edersonbrilhante @Brend-Smits I'm working on getting rid of conflicts in #5077 and there is a I wonder what's the use case of Priority being set by dynamic label? Priority makes sense mostly (and possibly only) when you define an EC2 fleet that has multiple instance_types and have prioritised allocation type. Then during a scale-up it tries to launch the instance type with the highest priority (and then going down the list in case of no available capacity). For the case of dynamic labels where you create a separate fleet and that fleet has only 1 instance type it doesn't really make sense to define the priority. |
|
@piotrj the logic is the same as today: "the first match", then uses the dynamic labels to override the launch template. |
Summary
This PR resumes and completes the work started in #4529.
It also allows to use any other dynamic labels with prefix
ghr-. Giving support for unique labels per job or per group of jobsIt ensures that EC2-specific config can be defined via
run-onsHow to test:
Use your regular labels, and add ghr-ec2-instance-type and ghr-ec2-image-id
In this case:
<regular-labels>