feat(SLSRE-704): add scale.extraMachinePools for additional capacity reservation#617
feat(SLSRE-704): add scale.extraMachinePools for additional capacity reservation#617Ajpantuso wants to merge 2 commits into
Conversation
…electing additional MachinePools to reserve capacity for during upgrades
WalkthroughThis PR extends the managed upgrade operator's scaling subsystem to support additional machine pool pools beyond workers. The ChangesExtra Machine Pool Scaling
Documentation and team updates
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Ajpantuso The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@Ajpantuso: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/scaler/machineSetScaler.go (1)
272-285:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid early return when one upgrade MachineSet already exists.
Returning immediately on the first existing
*-upgradeMachineSet prevents creating other missing upgrade MachineSets in the same reconcile loop (partial state never converges reliably).Suggested fix
func extraMachineSetCreated(c client.Client, originalMachinesets, upgradeMachinesets machineapi.MachineSetList, logger logr.Logger) (bool, error) { + createdAny := false for _, ms := range originalMachinesets.Items { found := false for _, ums := range upgradeMachinesets.Items { if ums.Name == ms.Name+"-upgrade" { found = true } } // extra machine already created if found { logger.Info(fmt.Sprintf("machineset for upgrade already created :%s", ms.Name)) - return false, nil + continue } @@ err := c.Create(context.TODO(), newMs) if err != nil { logger.Error(err, "failed to create machineset") return false, err } + createdAny = true } - return true, nil + return createdAny, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/scaler/machineSetScaler.go` around lines 272 - 285, The function extraMachineSetCreated currently returns immediately when it finds the first original MachineSet that already has a corresponding "*-upgrade" MachineSet, which prevents checking the rest; change the logic in extraMachineSetCreated to avoid early returns inside the loop over originalMachinesets: for each ms in originalMachinesets.Items, check upgradeMachinesets.Items and log if an upgrade exists (using logger.Info and ms.Name) but do not return; instead track whether any originals are missing their corresponding "*-upgrade" (e.g., increment a missing counter or set a allExist boolean) and after the loop finishes return a single result (bool, error) that reflects whether all upgrade MachineSets exist (or not) and any error encountered. Ensure you update the return sites to use that aggregated flag and keep references to extraMachineSetCreated, originalMachinesets, upgradeMachinesets, ms and logger to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/configmap.md`:
- Line 103: The fenced code block containing the config snippet for the scale
section is unlabeled and triggers markdownlint MD040; update the opening fence
from ``` to ```yaml so the block is language-identified (the block that contains
the scale/timeOut/extraMachinePools snippet), ensuring the fenced example is
declared as YAML.
In `@pkg/upgraders/config_test.go`:
- Around line 4-5: Update the Ginkgo import in the test file to use Ginkgo v2 by
changing the import path from "github.com/onsi/ginkgo" to
"github.com/onsi/ginkgo/v2" (the dotted import used at the top of
pkg/upgraders/config_test.go), so the test uses the v2 API consistently with
other *_test.go files; ensure the companion Gomega import stays unchanged.
In `@pkg/upgraders/scalerstep_test.go`:
- Around line 176-177: Add concise failure messages to the new Ginkgo assertions
so CI failures are actionable: update the two assertions
Expect(err).To(Not(HaveOccurred())) and Expect(ok).To(BeTrue()) (and the similar
pair at the other location) to include short message strings as the final
argument, e.g. Expect(err).To(Not(HaveOccurred()), "unexpected error from
<operation/context>") and Expect(ok).To(BeTrue(), "expected ok to be true after
<operation/context>"), replacing <operation/context> with the specific operation
being tested in scalerstep_test.go.
- Line 172: The mock expectation uses gomock.Any() for the new []string
ExtraMachinePools argument which hides regressions; update the
EnsureScaleUpNodes expectation on mockScalerClient to assert the exact forwarded
slice instead of gomock.Any() — compute or reference the expected extra pools in
the test and replace the gomock.Any() argument with
gomock.Eq(expectedExtraMachinePools) (or gomock.ElementsMatch/DeepEqual
equivalent) so EnsureScaleUpNodes(..., config.GetScaleDuration(),
expectedExtraMachinePools, ...) verifies the forwarded ExtraMachinePools value.
---
Outside diff comments:
In `@pkg/scaler/machineSetScaler.go`:
- Around line 272-285: The function extraMachineSetCreated currently returns
immediately when it finds the first original MachineSet that already has a
corresponding "*-upgrade" MachineSet, which prevents checking the rest; change
the logic in extraMachineSetCreated to avoid early returns inside the loop over
originalMachinesets: for each ms in originalMachinesets.Items, check
upgradeMachinesets.Items and log if an upgrade exists (using logger.Info and
ms.Name) but do not return; instead track whether any originals are missing
their corresponding "*-upgrade" (e.g., increment a missing counter or set a
allExist boolean) and after the loop finishes return a single result (bool,
error) that reflects whether all upgrade MachineSets exist (or not) and any
error encountered. Ensure you update the return sites to use that aggregated
flag and keep references to extraMachineSetCreated, originalMachinesets,
upgradeMachinesets, ms and logger to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 86c3fe33-669d-4e33-b8b3-0013d2ba0198
⛔ Files ignored due to path filters (4)
boilerplate/_data/last-boilerplate-commitis excluded by!boilerplate/**boilerplate/openshift/golang-osd-operator/OWNERS_ALIASESis excluded by!boilerplate/**build/Dockerfileis excluded by!build/**build/Dockerfile.olm-registryis excluded by!build/**
📒 Files selected for processing (10)
OWNERS_ALIASESdocs/configmap.mdpkg/scaler/machineSetScaler.gopkg/scaler/mocks/scaler.gopkg/scaler/scaler.gopkg/scaler/scaling_test.gopkg/upgraders/config.gopkg/upgraders/config_test.gopkg/upgraders/scalerstep.gopkg/upgraders/scalerstep_test.go
| | `extraMachinePools` | optional list of glob patterns matching additional `hive.openshift.io/machine-pool` label values whose MachineSets should also be scaled up during capacity reservation (e.g. `non-serving-*`). Patterns use Go's `path.Match` syntax. When empty or absent, only the default `worker` pool is scaled. | | ||
|
|
||
| Example: | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced example block.
Line 103 uses an unlabeled fenced code block, which triggers markdownlint MD040. Use ```yaml for this config snippet.
Proposed fix
-```
+```yaml
scale:
timeOut: 30
extraMachinePools:
- "non-serving-*"</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 103-103: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @docs/configmap.md at line 103, The fenced code block containing the config
snippet for the scale section is unlabeled and triggers markdownlint MD040;
update the opening fence from toyaml so the block is language-identified
(the block that contains the scale/timeOut/extraMachinePools snippet), ensuring
the fenced example is declared as YAML.
</details>
<!-- fingerprinting:phantom:triton:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
| . "github.com/onsi/ginkgo" | ||
| . "github.com/onsi/gomega" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether *_test.go files still import Ginkgo v1.
rg -n --type=go 'github.com/onsi/ginkgo"' -g '**/*_test.go'
rg -n --type=go 'github.com/onsi/ginkgo/v2"' -g '**/*_test.go'Repository: openshift/managed-upgrade-operator
Length of output: 4763
Use Ginkgo v2 in pkg/upgraders/config_test.go
pkg/upgraders/config_test.go imports . "github.com/onsi/ginkgo" (v1) at line 4; switch to . "github.com/onsi/ginkgo/v2" to match the *_test.go Ginkgo v2 guideline.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/upgraders/config_test.go` around lines 4 - 5, Update the Ginkgo import in
the test file to use Ginkgo v2 by changing the import path from
"github.com/onsi/ginkgo" to "github.com/onsi/ginkgo/v2" (the dotted import used
at the top of pkg/upgraders/config_test.go), so the test uses the v2 API
consistently with other *_test.go files; ensure the companion Gomega import
stays unchanged.
| gomock.InOrder( | ||
| mockCVClient.EXPECT().HasUpgradeCommenced(gomock.Any()).Return(false, nil), | ||
| mockScalerClient.EXPECT().CanScale(gomock.Any(), gomock.Any()).Return(false, nil), | ||
| mockScalerClient.EXPECT().EnsureScaleUpNodes(gomock.Any(), config.GetScaleDuration(), gomock.Any(), gomock.Any()).Return(true, nil), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -nP 'EnsureScaleUpNodes\([^)]*gomock\.Any\(\)\)\.Return' pkg/upgraders/scalerstep_test.goRepository: openshift/managed-upgrade-operator
Length of output: 60
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="pkg/upgraders/scalerstep_test.go"
echo "== EnsureScaleUpNodes occurrences =="
rg -n "EnsureScaleUpNodes" "$FILE" || true
echo
echo "== Import block (first ~80 lines) =="
sed -n '1,90p' "$FILE" | nl -ba
echo
echo "== Assertions around lines ~150-220 =="
sed -n '140,230p' "$FILE" | nl -ba
echo
echo "== HaveOccurred() assertions without messages (best-effort) =="
rg -n "HaveOccurred\\(\\)" "$FILE" || trueRepository: openshift/managed-upgrade-operator
Length of output: 689
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="pkg/upgraders/scalerstep_test.go"
echo "== Import block =="
sed -n '1,120p' "$FILE" | cat -n
echo
echo "== Assertions/expect blocks near 140-230 =="
sed -n '130,240p' "$FILE" | cat -n
echo
echo "== HaveOccurred() / NotTo(HaveOccurred()) occurrences (best-effort) =="
rg -n "NotTo\\(HaveOccurred\\(\\)\\)|To\\(HaveOccurred\\(\\)\\)|HaveOccurred\\(\\)" "$FILE" || trueRepository: openshift/managed-upgrade-operator
Length of output: 10827
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "type .*Scaler.*interface|EnsureScaleUpNodes" -S pkg/scaler pkg/upgraders | head -n 50
echo "== Search in scaler interface =="
rg -n "EnsureScaleUpNodes" pkg/scaler --glob='*.go'
echo
echo "== Search in scaler mock interface =="
rg -n "EnsureScaleUpNodes" pkg/scaler/mocks --glob='*.go'
echo
echo "== Show relevant parts of mock file =="
MOCKFILE=$(rg -l "EnsureScaleUpNodes" pkg/scaler/mocks --glob='*.go' | head -n 1 || true)
if [[ -n "${MOCKFILE:-}" ]]; then
echo "Mock file: $MOCKFILE"
sed -n '1,200p' "$MOCKFILE" | cat -n
else
echo "No mock file found containing EnsureScaleUpNodes"
fiRepository: openshift/managed-upgrade-operator
Length of output: 12210
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="pkg/upgraders/scalerstep_test.go"
echo "== buildTestUpgraderConfig definition =="
rg -n "func buildTestUpgraderConfig\\b|buildTestUpgraderConfig\\(" "$FILE"
if rg -q "func buildTestUpgraderConfig\\b" "$FILE"; then
# print around the function if present in this file
FUNC_LINE=$(rg -n "func buildTestUpgraderConfig\\b" "$FILE" | head -n1 | cut -d: -f1)
START=$((FUNC_LINE-20))
END=$((FUNC_LINE+80))
sed -n "${START},${END}p" "$FILE" | cat -n
else
# search for it in package
rg -n "func buildTestUpgraderConfig\\b" pkg/upgraders | head -n 20
fi
echo
echo "== ExtraMachinePools assignment in this test file =="
rg -n "ExtraMachinePools\\s*=" "$FILE"Repository: openshift/managed-upgrade-operator
Length of output: 491
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect buildTestUpgraderConfig defaults
SUITE_FILE="pkg/upgraders/upgraders_suite_test.go"
echo "== buildTestUpgraderConfig in $SUITE_FILE =="
rg -n "func buildTestUpgraderConfig\\b" "$SUITE_FILE"
FUNC_LINE=$(rg -n "func buildTestUpgraderConfig\\b" "$SUITE_FILE" | head -n1 | cut -d: -f1)
START=$((FUNC_LINE-30)); END=$((FUNC_LINE+120))
sed -n "${START},${END}p" "$SUITE_FILE" | cat -n
echo
echo "== Ginkgo imports in suite file =="
sed -n '1,80p' "$SUITE_FILE" | cat -n
# 2) Inspect go.mod for Ginkgo version
echo
echo "== go.mod ginkgo references =="
if [[ -f go.mod ]]; then
rg -n "ginkgo" go.mod || true
rg -n "onsi/ginkgo" go.mod || true
fiRepository: openshift/managed-upgrade-operator
Length of output: 2146
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SUITE_FILE="pkg/upgraders/upgraders_suite_test.go"
echo "== Function line for buildTestUpgraderConfig =="
FUNC_LINE=$(rg -n "func buildTestUpgraderConfig\\b" "$SUITE_FILE" | head -n1 | cut -d: -f1)
echo "FUNC_LINE=$FUNC_LINE"
START=$((FUNC_LINE-10))
END=$((FUNC_LINE+120))
echo "== buildTestUpgraderConfig context lines ${START}-${END} =="
sed -n "${START},${END}p" "$SUITE_FILE" | cat -n
echo
echo "== Suite file imports (top 60 lines) =="
sed -n '1,60p' "$SUITE_FILE" | cat -n
echo
echo "== go.mod ginkgo references =="
rg -n "ginkgo|onsi/ginkgo" go.mod || trueRepository: openshift/managed-upgrade-operator
Length of output: 2490
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "type (scaleConfig|upgraderConfig) struct|ExtraMachinePools" pkg/upgraders | head -n 50
rg -n "ExtraMachinePools" pkg/upgraders | head -n 50Repository: openshift/managed-upgrade-operator
Length of output: 2929
Assert forwarded ExtraMachinePools value in the EnsureScaleUpNodes mock expectation.
gomock.Any() for the new []string argument can hide regressions where incorrect extra machine pool patterns are forwarded.
Suggested test tightening
- mockScalerClient.EXPECT().EnsureScaleUpNodes(gomock.Any(), config.GetScaleDuration(), gomock.Any(), gomock.Any()).Return(true, nil),
+ mockScalerClient.EXPECT().EnsureScaleUpNodes(
+ gomock.Any(),
+ config.GetScaleDuration(),
+ gomock.Any(),
+ gomock.Eq(config.Scale.ExtraMachinePools),
+ ).Return(true, nil),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mockScalerClient.EXPECT().EnsureScaleUpNodes(gomock.Any(), config.GetScaleDuration(), gomock.Any(), gomock.Any()).Return(true, nil), | |
| mockScalerClient.EXPECT().EnsureScaleUpNodes( | |
| gomock.Any(), | |
| config.GetScaleDuration(), | |
| gomock.Any(), | |
| gomock.Eq(config.Scale.ExtraMachinePools), | |
| ).Return(true, nil), |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/upgraders/scalerstep_test.go` at line 172, The mock expectation uses
gomock.Any() for the new []string ExtraMachinePools argument which hides
regressions; update the EnsureScaleUpNodes expectation on mockScalerClient to
assert the exact forwarded slice instead of gomock.Any() — compute or reference
the expected extra pools in the test and replace the gomock.Any() argument with
gomock.Eq(expectedExtraMachinePools) (or gomock.ElementsMatch/DeepEqual
equivalent) so EnsureScaleUpNodes(..., config.GetScaleDuration(),
expectedExtraMachinePools, ...) verifies the forwarded ExtraMachinePools value.
| Expect(err).To(Not(HaveOccurred())) | ||
| Expect(ok).To(BeTrue()) |
There was a problem hiding this comment.
Add failure messages to new expectations for debuggability.
The new assertions omit failure context; include short messages so CI failures are actionable.
As per coding guidelines Ginkgo test code quality rule: assertions should include meaningful failure messages, and assertions like Expect(err).NotTo(HaveOccurred()) without messages should be flagged.
Also applies to: 188-189
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/upgraders/scalerstep_test.go` around lines 176 - 177, Add concise failure
messages to the new Ginkgo assertions so CI failures are actionable: update the
two assertions Expect(err).To(Not(HaveOccurred())) and Expect(ok).To(BeTrue())
(and the similar pair at the other location) to include short message strings as
the final argument, e.g. Expect(err).To(Not(HaveOccurred()), "unexpected error
from <operation/context>") and Expect(ok).To(BeTrue(), "expected ok to be true
after <operation/context>"), replacing <operation/context> with the specific
operation being tested in scalerstep_test.go.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #617 +/- ##
==========================================
+ Coverage 53.59% 54.27% +0.67%
==========================================
Files 123 123
Lines 6165 6204 +39
==========================================
+ Hits 3304 3367 +63
+ Misses 2668 2631 -37
- Partials 193 206 +13
🚀 New features to boost your workflow:
|
Summary
scale.extraMachinePoolsConfigMap field (list of glob patterns) to select additional MachineSets beyond the defaultworkerpool for capacity reservation scaling during upgrades.non-serving-*) alongside workers, controlled per-cluster via the operator ConfigMap.Changes
pkg/upgraders/config.go: AddedExtraMachinePools []stringtoscaleConfig, with glob pattern validation viapath.Match. WiredScale.IsValid()into theupgraderConfig.IsValid()chain so invalid patterns fail at upgrader construction time.pkg/scaler/scaler.go: UpdatedScalerinterface —EnsureScaleUpNodesnow accepts a[]stringparameter for extra pool patterns.pkg/scaler/machineSetScaler.go: AddedLABEL_MACHINE_POOLconstant andlistMachineSetsByPoolPatterns()helper (lists MachineSets withhive.openshift.io/machine-poollabel, filters by glob, excludesworker). RestructuredEnsureScaleUpNodesso worker and extra pool fetching are independent.pkg/upgraders/scalerstep.go: ThreadsExtraMachinePoolsintoEnsureScaleUpNodes. WhenCanScalereturns false (no worker MachineSets) but extra pools are configured, scaling proceeds with extra pools only instead of skipping entirely.Tests: 8 new scaler tests, 2 new scalerstep tests, 12 new config validation tests. All existing tests updated for the new interface.
docs/configmap.md: Documented the newextraMachinePoolsfield with description and YAML example.Configuration
When
extraMachinePoolsis empty or absent, behavior is identical to the current implementation — only the defaultworkerpool is scaled. Patterns use Go'spath.Matchsyntax.Summary by CodeRabbit
Release Notes
New Features
extraMachinePoolsconfiguration option enabling capacity reservation to scale additional machine pools beyond workers using glob pattern matching.Documentation
extraMachinePoolsknob and usage examples.Tests