Committed reservations allocations consider enter/leave events of VMs#644
Committed reservations allocations consider enter/leave events of VMs#644
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two configurable grace-period durations and updates commitment reservation reconciliation to verify recent allocations via Nova API and older allocations via Hypervisor CRD (with Nova fallback). Controller requeue timing and patching behavior updated; tests and documentation extended to cover the new verification paths. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Controller as Controller
participant Allocation as Allocation
participant Nova as Nova API
participant Hypervisor as Hypervisor CRD
participant Status as Reservation Status
Controller->>Allocation: Inspect allocation age
alt age < AllocationGracePeriod
Controller->>Nova: Optional Get VM details
alt Nova returns host
Nova-->>Controller: Host info
Controller->>Status: Add/update allocation with host
else Nova missing/404/no host
Nova-->>Controller: Not yet visible
Controller->>Status: Skip (defer verification)
end
Controller->>Controller: Mark HasAllocationsInGracePeriod=true
else age >= AllocationGracePeriod
Controller->>Hypervisor: Check Instances list for VM
alt Found on Hypervisor
Hypervisor-->>Controller: Instance present
Controller->>Status: Add/update allocation with host
else Not found on Hypervisor
Controller->>Nova: Fallback Get VM details
alt Nova confirms host
Nova-->>Controller: Host info
Controller->>Status: Add/update allocation
else Nova 404 / not found
Nova-->>Controller: Absent
Controller->>Allocation: Remove from Spec.Allocations
Controller->>Controller: Patch Spec, re-fetch, patch Status
end
end
end
Controller->>Controller: Requeue after RequeueIntervalGracePeriod or RequeueIntervalActive
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
helm/bundles/cortex-nova/alerts/nova.alerts.yaml (1)
489-506:⚠️ Potential issue | 🟡 MinorThe disabled-feature explanation does not match how this metric is emitted.
cortex_committed_resource_syncer_runs_totalis registered unconditionally ininternal/scheduling/reservations/commitments/syncer_monitor.go:37-85, soabsent(...)only tells us the service is not exposing metrics. A cluster withcommitments-sync-taskdisabled will still alert viaincrease(...)=0, which makes the new “metric is missing / feature is not enabled” guidance misleading for on-call. Please gate this rule on a real task-enabled signal, or drop the disabled-feature explanation from the annotation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/bundles/cortex-nova/alerts/nova.alerts.yaml` around lines 489 - 506, The annotation's guidance is incorrect because cortex_committed_resource_syncer_runs_total is always registered, so absent(...) means metrics aren't being scraped, not that the feature is disabled; either remove the “or the feature is not enabled” language from the description/summary in nova.alerts.yaml (and leave the existing expr using cortex_committed_resource_syncer_runs_total) or gate the alert by a real task-enabled signal (e.g., add a condition that requires a boolean/metric that indicates the commitments-sync-task is enabled such as commitments_sync_task_enabled == 1) so the alert only fires when the task is actually enabled; update the description to match the chosen approach and reference cortex_committed_resource_syncer_runs_total and the alert block in nova.alerts.yaml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helm/bundles/cortex-nova/alerts/nova.alerts.yaml`:
- Around line 525-529: The alert divides a CounterVec
(cortex_committed_resource_syncer_commitments_skipped_total with label reason)
by a Counter (cortex_committed_resource_syncer_commitments_total) causing label
mismatch and empty results; fix by aggregating away the reason label before the
division, e.g. replace
rate(cortex_committed_resource_syncer_commitments_skipped_total{service="cortex-nova-metrics",
reason="..."}[1h]) with sum by
(service)(rate(cortex_committed_resource_syncer_commitments_skipped_total{service="cortex-nova-metrics"}[1h]))
so the numerator and denominator share the same labels, and apply the same
change for the second occurrence around lines 548-552.
---
Outside diff comments:
In `@helm/bundles/cortex-nova/alerts/nova.alerts.yaml`:
- Around line 489-506: The annotation's guidance is incorrect because
cortex_committed_resource_syncer_runs_total is always registered, so absent(...)
means metrics aren't being scraped, not that the feature is disabled; either
remove the “or the feature is not enabled” language from the description/summary
in nova.alerts.yaml (and leave the existing expr using
cortex_committed_resource_syncer_runs_total) or gate the alert by a real
task-enabled signal (e.g., add a condition that requires a boolean/metric that
indicates the commitments-sync-task is enabled such as
commitments_sync_task_enabled == 1) so the alert only fires when the task is
actually enabled; update the description to match the chosen approach and
reference cortex_committed_resource_syncer_runs_total and the alert block in
nova.alerts.yaml.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cf03d3d5-e2e5-4518-9789-4d639071c0bc
📒 Files selected for processing (1)
helm/bundles/cortex-nova/alerts/nova.alerts.yaml
This reverts commit b8f55a1.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.go (1)
72-85: Consider indexed lookup instead of full reservation scan in the scheduling hot path.Listing and iterating all reservations on every filter run is O(n) per request; this can become costly at scale. Consider cache indexers/field selectors (projectID, resourceGroup, type, readiness) to shrink scan cost.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.go` around lines 72 - 85, The code currently calls s.Client.List into reservations and then filters with s.isMatchingCRReservation for every scheduling request, causing O(n) scans; replace this with an indexed/selector-based lookup: add a field indexer on Reservation for keys you filter by (e.g., projectID, resourceGroup, type, readiness) and query via the cache/indexer or use client.List with client.MatchingFields/client.MatchingLabels or a field selector to only return matching Reservation items instead of listing all; update the code paths that populate reservations (the variable reservations and the call site using s.Client.List) to use the indexer (e.g., reservationIndexer.ByIndex or a cached lister) so you avoid the full reservation scan while still applying s.isMatchingCRReservation for any remaining checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@internal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.go`:
- Around line 223-226: The code currently sets reservationHost from
reservation.Spec.TargetHost falling back to reservation.Status.Host; change the
resolution to prefer reservation.Status.Host first and only use
reservation.Spec.TargetHost if Status.Host is empty so ready reservations use
the effective placement. Update the logic around reservationHost (variables:
reservationHost, reservation.Spec.TargetHost, reservation.Status.Host) to check
Status.Host first and then Spec.TargetHost as the fallback.
- Around line 256-262: The update currently logs and skips on any error from
s.Client.Update when writing reservationCopy, which can lose VM allocation
updates under concurrent conflicts; change this to a conflict-safe retry loop:
fetch the latest reservation (use s.Client.Get or the same client used
elsewhere), apply the allocation mutation to the fresh object, and attempt
s.Client.Update again, retrying a few times only on API conflict errors (detect
via apierrors.IsConflict(err)) and breaking on success or non-conflict errors;
preserve the existing traceLog.Warn/traceLog.Info messages but include the
retry/failure path for reservation.Name and instanceUUID so allocations aren't
silently dropped.
---
Nitpick comments:
In
`@internal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.go`:
- Around line 72-85: The code currently calls s.Client.List into reservations
and then filters with s.isMatchingCRReservation for every scheduling request,
causing O(n) scans; replace this with an indexed/selector-based lookup: add a
field indexer on Reservation for keys you filter by (e.g., projectID,
resourceGroup, type, readiness) and query via the cache/indexer or use
client.List with client.MatchingFields/client.MatchingLabels or a field selector
to only return matching Reservation items instead of listing all; update the
code paths that populate reservations (the variable reservations and the call
site using s.Client.List) to use the indexer (e.g., reservationIndexer.ByIndex
or a cached lister) so you avoid the full reservation scan while still applying
s.isMatchingCRReservation for any remaining checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6cc970b2-763b-4a86-9023-189ff5f8d929
📒 Files selected for processing (4)
helm/bundles/cortex-nova/alerts/nova.alerts.yamlhelm/bundles/cortex-nova/templates/pipelines_kvm.yamlinternal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go
✅ Files skipped from review due to trivial changes (1)
- internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go
🚧 Files skipped from review as they are similar to previous changes (1)
- helm/bundles/cortex-nova/alerts/nova.alerts.yaml
| reservationHost := reservation.Spec.TargetHost | ||
| if reservationHost == "" { | ||
| reservationHost = reservation.Status.Host | ||
| } |
There was a problem hiding this comment.
Prefer Status.Host over Spec.TargetHost when resolving reservation host.
For ready reservations, Status.Host represents the effective placement; using Spec.TargetHost first can misclassify candidate membership if desired and actual host diverge.
Proposed host resolution order
- reservationHost := reservation.Spec.TargetHost
- if reservationHost == "" {
- reservationHost = reservation.Status.Host
- }
+ reservationHost := reservation.Status.Host
+ if reservationHost == "" {
+ reservationHost = reservation.Spec.TargetHost
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| reservationHost := reservation.Spec.TargetHost | |
| if reservationHost == "" { | |
| reservationHost = reservation.Status.Host | |
| } | |
| reservationHost := reservation.Status.Host | |
| if reservationHost == "" { | |
| reservationHost = reservation.Spec.TargetHost | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@internal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.go`
around lines 223 - 226, The code currently sets reservationHost from
reservation.Spec.TargetHost falling back to reservation.Status.Host; change the
resolution to prefer reservation.Status.Host first and only use
reservation.Spec.TargetHost if Status.Host is empty so ready reservations use
the effective placement. Update the logic around reservationHost (variables:
reservationHost, reservation.Spec.TargetHost, reservation.Status.Host) to check
Status.Host first and then Spec.TargetHost as the fallback.
| if err := s.Client.Update(context.Background(), reservationCopy); err != nil { | ||
| traceLog.Warn("failed to update reservation with VM allocation", | ||
| "reservation", reservation.Name, | ||
| "instanceUUID", instanceUUID, | ||
| "error", err) | ||
| // Continue with other reservations - this is best-effort | ||
| } else { |
There was a problem hiding this comment.
Handle update conflicts to avoid silently dropping allocations.
Concurrent schedulings can race on the same reservation; on conflict, this code currently logs and skips, which can permanently miss VM allocation bookkeeping for that request.
Suggested conflict-safe update pattern
+import (
+ "k8s.io/apimachinery/pkg/types"
+ "k8s.io/client-go/util/retry"
+)
...
- if err := s.Client.Update(context.Background(), reservationCopy); err != nil {
+ err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
+ latest := &v1alpha1.Reservation{}
+ if getErr := s.Client.Get(
+ context.Background(),
+ types.NamespacedName{Name: reservation.Name, Namespace: reservation.Namespace},
+ latest,
+ ); getErr != nil {
+ return getErr
+ }
+
+ if latest.Spec.CommittedResourceReservation == nil {
+ return nil
+ }
+ if latest.Spec.CommittedResourceReservation.Allocations == nil {
+ latest.Spec.CommittedResourceReservation.Allocations = make(map[string]v1alpha1.CommittedResourceAllocation)
+ }
+ if _, exists := latest.Spec.CommittedResourceReservation.Allocations[instanceUUID]; exists {
+ return nil
+ }
+
+ latest.Spec.CommittedResourceReservation.Allocations[instanceUUID] = v1alpha1.CommittedResourceAllocation{
+ CreationTimestamp: now,
+ Resources: vmResources,
+ }
+ return s.Client.Update(context.Background(), latest)
+ })
+ if err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@internal/scheduling/nova/plugins/filters/filter_committed_resource_bookkeeping.go`
around lines 256 - 262, The update currently logs and skips on any error from
s.Client.Update when writing reservationCopy, which can lose VM allocation
updates under concurrent conflicts; change this to a conflict-safe retry loop:
fetch the latest reservation (use s.Client.Get or the same client used
elsewhere), apply the allocation mutation to the fresh object, and attempt
s.Client.Update again, retrying a few times only on API conflict errors (detect
via apierrors.IsConflict(err)) and breaking on success or non-conflict errors;
preserve the existing traceLog.Warn/traceLog.Info messages but include the
retry/failure path for reservation.Name and instanceUUID so allocations aren't
silently dropped.
|
I will refactor w/o using nova client.. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reservations/committed-resource-reservations.md`:
- Around line 65-67: The docs table incorrectly claims the controller implements
periodic "Optimize unused slots" behavior; change that row to indicate this
feature is planned (not implemented) and update the timing and notes accordingly
(e.g., mark timing as "Planned/TBD" and replace "Assign PAYG VMs or re-place
reservations" with a short "Planned: see TODO in controller"). Also add a brief
pointer that the actual TODO remains in
internal/scheduling/reservations/commitments/controller.go (the periodic
re-placement note around the TODO) so readers know where the in-progress work is
tracked.
In `@internal/scheduling/reservations/commitments/controller.go`:
- Around line 503-525: When specChanged triggers a re-fetch, the earlier
assignment res.Status.CommittedResourceReservation.Allocations =
newStatusAllocations is lost because r.Get overwrites res; after the re-fetch
(after the r.Get call and before creating the MergeFrom patch/old snapshot),
reapply res.Status.CommittedResourceReservation.Allocations =
newStatusAllocations (or reassign it immediately after old = res.DeepCopy() but
before creating patch := client.MergeFrom(old)) so that the subsequent patch :=
client.MergeFrom(old) and r.Status().Patch(ctx, res, patch) include the updated
allocations; update code around r.Get, old = res.DeepCopy(), and patch creation
accordingly.
- Around line 455-479: The code treats any Nova lookup with a non-empty
novaServer.ComputeHost as a migration and falls through to removal even if Nova
reports the VM is still on expectedHost; update the check in the block that
calls r.NovaClient.Get(ctx, vmUUID) so it only treats the VM as on a different
host when novaServer.ComputeHost != "" AND novaServer.ComputeHost !=
expectedHost (i.e., compare novaServer.ComputeHost to expectedHost before
logging the "VM on different host past grace period" message and before falling
through to removal); keep the existing error handling around errors.As(...,
&novaservers.ErrServerNotFound) and ensure allocationsToRemove is only appended
for vmUUID when Nova confirms it's absent or on a different host than expected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a91fb0de-2954-4d7f-a880-f125fa7a905d
📒 Files selected for processing (2)
docs/reservations/committed-resource-reservations.mdinternal/scheduling/reservations/commitments/controller.go
| | **Change API / Syncer** | CR Create, Resize, Delete | Immediate/Hourly | Create/update/delete Reservation CRDs | | ||
| | **Controller** | Placement | On creation | Find host via scheduler API, set `TargetHost` | | ||
| | **Controller** | Optimize unused slots | >> minutes | Assign PAYG VMs or re-place reservations | |
There was a problem hiding this comment.
Mark this row as planned, not implemented.
internal/scheduling/reservations/commitments/controller.go, Line 118 still tracks periodic unused-slot re-placement as TODO, so this table currently documents behavior the controller does not provide yet.
💡 Possible wording
-| **Controller** | Optimize unused slots | >> minutes | Assign PAYG VMs or re-place reservations |
+| **Controller** | Optimize unused slots (planned) | N/A | TODO: assign PAYG VMs or re-place reservations |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/reservations/committed-resource-reservations.md` around lines 65 - 67,
The docs table incorrectly claims the controller implements periodic "Optimize
unused slots" behavior; change that row to indicate this feature is planned (not
implemented) and update the timing and notes accordingly (e.g., mark timing as
"Planned/TBD" and replace "Assign PAYG VMs or re-place reservations" with a
short "Planned: see TODO in controller"). Also add a brief pointer that the
actual TODO remains in
internal/scheduling/reservations/commitments/controller.go (the periodic
re-placement note around the TODO) so readers know where the in-progress work is
tracked.
| novaServer, err := r.NovaClient.Get(ctx, vmUUID) | ||
| if err == nil && novaServer.ComputeHost != "" { | ||
| // VM is on a different host past the grace period - remove from this reservation. | ||
| // The grace period is the window for VMs in transit; past that, a VM on the | ||
| // wrong host is no longer consuming this slot. | ||
| logger.Info("VM on different host past grace period, removing from reservation", | ||
| "vm", vmUUID, | ||
| "actualHost", novaServer.ComputeHost, | ||
| "expectedHost", expectedHost, | ||
| "allocationAge", allocationAge) | ||
| // fall through to removal | ||
| } else { | ||
| var notFound *novaservers.ErrServerNotFound | ||
| if err != nil && !errors.As(err, ¬Found) { | ||
| // Transient Nova API error - skip removal to avoid evicting live VMs | ||
| return nil, fmt.Errorf("nova API error checking VM %s: %w", vmUUID, err) | ||
| } | ||
| // err is nil (VM has no host yet) or 404 (VM gone) - fall through to removal | ||
| logger.V(1).Info("Nova API confirmed VM not found or has no host", | ||
| "vm", vmUUID, | ||
| "error", err) | ||
| } | ||
| } | ||
| // VM not found on hypervisor and not in Nova - mark for removal (leaving VM) | ||
| allocationsToRemove = append(allocationsToRemove, vmUUID) |
There was a problem hiding this comment.
Check expectedHost before treating the Nova fallback as a migration.
Line 456 only tests ComputeHost != "", so any Nova-confirmed server falls through to removal below—even when Nova says the VM is still on expectedHost. If the Hypervisor CRD lags, this drops live allocations from Spec.
💡 Proposed fix
- if err == nil && novaServer.ComputeHost != "" {
- // VM is on a different host past the grace period - remove from this reservation.
- // The grace period is the window for VMs in transit; past that, a VM on the
- // wrong host is no longer consuming this slot.
- logger.Info("VM on different host past grace period, removing from reservation",
- "vm", vmUUID,
- "actualHost", novaServer.ComputeHost,
- "expectedHost", expectedHost,
- "allocationAge", allocationAge)
- // fall through to removal
- } else {
+ if err == nil {
+ switch {
+ case novaServer.ComputeHost == expectedHost && expectedHost != "":
+ newStatusAllocations[vmUUID] = expectedHost
+ logger.V(1).Info("verified VM allocation via Nova fallback",
+ "vm", vmUUID,
+ "host", expectedHost)
+ continue
+ case novaServer.ComputeHost != "":
+ // VM is on a different host past the grace period - remove from this reservation.
+ logger.Info("VM on different host past grace period, removing from reservation",
+ "vm", vmUUID,
+ "actualHost", novaServer.ComputeHost,
+ "expectedHost", expectedHost,
+ "allocationAge", allocationAge)
+ // fall through to removal
+ }
+ } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/controller.go` around lines 455
- 479, The code treats any Nova lookup with a non-empty novaServer.ComputeHost
as a migration and falls through to removal even if Nova reports the VM is still
on expectedHost; update the check in the block that calls r.NovaClient.Get(ctx,
vmUUID) so it only treats the VM as on a different host when
novaServer.ComputeHost != "" AND novaServer.ComputeHost != expectedHost (i.e.,
compare novaServer.ComputeHost to expectedHost before logging the "VM on
different host past grace period" message and before falling through to
removal); keep the existing error handling around errors.As(...,
&novaservers.ErrServerNotFound) and ensure allocationsToRemove is only appended
for vmUUID when Nova confirms it's absent or on a different host than expected.
| res.Status.CommittedResourceReservation.Allocations = newStatusAllocations | ||
|
|
||
| // Patch Spec if changed (stale allocations removed) | ||
| if specChanged { | ||
| if err := r.Patch(ctx, res, client.MergeFrom(old)); err != nil { | ||
| if client.IgnoreNotFound(err) == nil { | ||
| return result, nil | ||
| } | ||
| return nil, fmt.Errorf("failed to patch reservation spec: %w", err) | ||
| } | ||
| // Re-fetch to get the updated resource version for status patch | ||
| if err := r.Get(ctx, client.ObjectKeyFromObject(res), res); err != nil { | ||
| if client.IgnoreNotFound(err) == nil { | ||
| return result, nil | ||
| } | ||
| return nil, fmt.Errorf("failed to re-fetch reservation: %w", err) | ||
| } | ||
| old = res.DeepCopy() | ||
| } | ||
|
|
||
| // Patch Status | ||
| patch := client.MergeFrom(old) | ||
| if err := r.Status().Patch(ctx, res, patch); err != nil { |
There was a problem hiding this comment.
Reapply newStatusAllocations after the refetch.
Line 514 overwrites res with the fresh API object, which discards the assignment from Line 503. When specChanged is true, the Status().Patch() below becomes a no-op and Status.CommittedResourceReservation.Allocations stays stale until the next reconcile.
💡 Proposed fix
- // Update Status.Allocations
- res.Status.CommittedResourceReservation.Allocations = newStatusAllocations
-
// Patch Spec if changed (stale allocations removed)
if specChanged {
if err := r.Patch(ctx, res, client.MergeFrom(old)); err != nil {
if client.IgnoreNotFound(err) == nil {
return result, nil
@@
if err := r.Get(ctx, client.ObjectKeyFromObject(res), res); err != nil {
if client.IgnoreNotFound(err) == nil {
return result, nil
}
return nil, fmt.Errorf("failed to re-fetch reservation: %w", err)
}
- old = res.DeepCopy()
}
+ old = res.DeepCopy()
+ if res.Status.CommittedResourceReservation == nil {
+ res.Status.CommittedResourceReservation = &v1alpha1.CommittedResourceReservationStatus{}
+ }
+ res.Status.CommittedResourceReservation.Allocations = newStatusAllocations
+
// Patch Status
patch := client.MergeFrom(old)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| res.Status.CommittedResourceReservation.Allocations = newStatusAllocations | |
| // Patch Spec if changed (stale allocations removed) | |
| if specChanged { | |
| if err := r.Patch(ctx, res, client.MergeFrom(old)); err != nil { | |
| if client.IgnoreNotFound(err) == nil { | |
| return result, nil | |
| } | |
| return nil, fmt.Errorf("failed to patch reservation spec: %w", err) | |
| } | |
| // Re-fetch to get the updated resource version for status patch | |
| if err := r.Get(ctx, client.ObjectKeyFromObject(res), res); err != nil { | |
| if client.IgnoreNotFound(err) == nil { | |
| return result, nil | |
| } | |
| return nil, fmt.Errorf("failed to re-fetch reservation: %w", err) | |
| } | |
| old = res.DeepCopy() | |
| } | |
| // Patch Status | |
| patch := client.MergeFrom(old) | |
| if err := r.Status().Patch(ctx, res, patch); err != nil { | |
| // Patch Spec if changed (stale allocations removed) | |
| if specChanged { | |
| if err := r.Patch(ctx, res, client.MergeFrom(old)); err != nil { | |
| if client.IgnoreNotFound(err) == nil { | |
| return result, nil | |
| } | |
| return nil, fmt.Errorf("failed to patch reservation spec: %w", err) | |
| } | |
| // Re-fetch to get the updated resource version for status patch | |
| if err := r.Get(ctx, client.ObjectKeyFromObject(res), res); err != nil { | |
| if client.IgnoreNotFound(err) == nil { | |
| return result, nil | |
| } | |
| return nil, fmt.Errorf("failed to re-fetch reservation: %w", err) | |
| } | |
| } | |
| old = res.DeepCopy() | |
| if res.Status.CommittedResourceReservation == nil { | |
| res.Status.CommittedResourceReservation = &v1alpha1.CommittedResourceReservationStatus{} | |
| } | |
| res.Status.CommittedResourceReservation.Allocations = newStatusAllocations | |
| // Patch Status | |
| patch := client.MergeFrom(old) | |
| if err := r.Status().Patch(ctx, res, patch); err != nil { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/controller.go` around lines 503
- 525, When specChanged triggers a re-fetch, the earlier assignment
res.Status.CommittedResourceReservation.Allocations = newStatusAllocations is
lost because r.Get overwrites res; after the re-fetch (after the r.Get call and
before creating the MergeFrom patch/old snapshot), reapply
res.Status.CommittedResourceReservation.Allocations = newStatusAllocations (or
reassign it immediately after old = res.DeepCopy() but before creating patch :=
client.MergeFrom(old)) so that the subsequent patch := client.MergeFrom(old) and
r.Status().Patch(ctx, res, patch) include the updated allocations; update code
around r.Get, old = res.DeepCopy(), and patch creation accordingly.
Test Coverage ReportTest Coverage 📊: 68.8% |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/reservations/committed-resource-reservations.md (1)
41-42: Consider clarifying the CRD-to-slot relationship.The terminology "reservation CRD," "reservation slot," and "CR entry" could be clearer. Readers may not immediately understand that a "CR entry" (from the customer's perspective) maps to multiple "reservation CRDs" (from the Cortex perspective).
📝 Suggested wording improvement
-A reservation CRD represents a single reservation slot on a hypervisor, which holds multiple VMs. -A single CR entry typically refers to multiple reservation CRDs (slots). +Each Reservation CRD represents one slot on a hypervisor that can hold multiple VMs. +A customer's committed resource (CR) entry typically maps to multiple reservation slots (multiple Reservation CRDs).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/reservations/committed-resource-reservations.md` around lines 41 - 42, Clarify the CRD-to-slot mapping by explicitly defining the three terms used: "reservation CRD" (the Cortex-side CRD representing a single hypervisor slot), "reservation slot" (the actual slot on a hypervisor that can host VMs), and "CR entry" (the customer-facing reservation request); then add a sentence that plainly states the mapping (e.g., "One CR entry from the customer's perspective may correspond to multiple reservation CRDs (slots) in Cortex, so a single CR entry can map to N reservation CRDs/slots"). Update the existing lines that mention "reservation CRD", "reservation slot", and "CR entry" to use these definitions and include a short concrete example to make the relationship unambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reservations/committed-resource-reservations.md`:
- Around line 142-147: The fenced code block showing the reservation formulas is
missing a language identifier; update the opening fence from ``` to ```text (or
another appropriate language tag) so the block containing "confirmed = sum of
resources..." through "block = max(remaining, spec_only_unblocked)" renders
correctly in Markdown; ensure only the opening fence is changed and the content
and closing fence remain the same.
---
Nitpick comments:
In `@docs/reservations/committed-resource-reservations.md`:
- Around line 41-42: Clarify the CRD-to-slot mapping by explicitly defining the
three terms used: "reservation CRD" (the Cortex-side CRD representing a single
hypervisor slot), "reservation slot" (the actual slot on a hypervisor that can
host VMs), and "CR entry" (the customer-facing reservation request); then add a
sentence that plainly states the mapping (e.g., "One CR entry from the
customer's perspective may correspond to multiple reservation CRDs (slots) in
Cortex, so a single CR entry can map to N reservation CRDs/slots"). Update the
existing lines that mention "reservation CRD", "reservation slot", and "CR
entry" to use these definitions and include a short concrete example to make the
relationship unambiguous.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 251709a9-856d-4875-949e-5b02929eefa4
📒 Files selected for processing (1)
docs/reservations/committed-resource-reservations.md
| ``` | ||
| confirmed = sum of resources for VMs in both Spec.Allocations and Status.Allocations | ||
| spec_only_unblocked = sum of resources for VMs in Spec.Allocations only, NOT having an active pessimistic blocking reservation on this host | ||
| remaining = max(0, Spec.Resources - confirmed) | ||
| block = max(remaining, spec_only_unblocked) | ||
| ``` |
There was a problem hiding this comment.
Add language specification to the code block.
The fenced code block should specify a language identifier for proper markdown rendering.
🔧 Proposed fix
-```
+```text
confirmed = sum of resources for VMs in both Spec.Allocations and Status.Allocations
spec_only_unblocked = sum of resources for VMs in Spec.Allocations only, NOT having an active pessimistic blocking reservation on this host
remaining = max(0, Spec.Resources - confirmed)
block = max(remaining, spec_only_unblocked)
</details>
As per coding guidelines: Documentation should be written in markdown format.
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 142-142: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @docs/reservations/committed-resource-reservations.md around lines 142 - 147,
The fenced code block showing the reservation formulas is missing a language
identifier; update the opening fence from totext (or another appropriate
language tag) so the block containing "confirmed = sum of resources..." through
"block = max(remaining, spec_only_unblocked)" renders correctly in Markdown;
ensure only the opening fence is changed and the content and closing fence
remain the same.
</details>
<!-- fingerprinting:phantom:triton:puma:8200b92a-b617-4e6f-a7d4-a51f8566b3ec -->
<!-- This is an auto-generated comment by CodeRabbit -->
No description provided.