[scheduler] Add scheduling mode: E-PVM by DiegoTavares · Pull Request #2398 · AcademySoftwareFoundation/OpenCue

DiegoTavares · 2026-06-08T23:50:28Z

Add new scheduling mode to scheduler module that implements E-PVM, inspired by the work of Aghiles' scheduler.

Summary by CodeRabbit

Release Notes

New Features
- Added per-show scheduler ownership control via cueadmin -scheduler-managed <show> on|off command
- Added scheduler_managed property to shows via gRPC API and Python wrappers
- Introduced Redis-backed accounting for scheduler-managed shows
Configuration
- Added Redis accounting configuration options (accounting.redis.enabled, accounting.redis.host, accounting.redis.port)
- Extended scheduler queue configuration with new timing and sizing parameters
Documentation
- Updated deployment guides and technical references for per-show scheduler ownership model
Version
- Bumped version from 1.24 to 1.25

Summary by CodeRabbit

Release Notes

New Features
- Integrated Redis-based resource accounting system for enhanced booking and limit management
- Added E-PVM (energy-performance-value-matrix) host selection strategy alongside existing saturation-based selection
- Enhanced cluster scheduling with round-trip tracking and intelligent sleep management
Improvements
- Expanded host selection with profile-aware placement and configurable scoring weights
- Added new metrics for resource accounting limits and placement performance monitoring
- Enhanced scheduler configuration with new queue timing, dispatch caps, and resource reservation parameters

coderabbitai · 2026-06-08T23:50:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4bcff7e-6bd2-4b33-aec7-709519ddb18a

📥 Commits

Reviewing files that changed from the base of the PR and between eac7ebf and 3147111.

📒 Files selected for processing (1)

cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatchQuery.java

✅ Files skipped from review due to trivial changes (1)

cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatchQuery.java

📝 Walkthrough

Walkthrough

The PR integrates Redis-backed resource accounting and placement strategy scoring into the scheduler, replacing prior resource accounting services. Facility identifiers become strings throughout the data model. The host cache checkout API changes to accept placement profiles and scoring gates, supporting both saturation (first-fit) and E-PVM (lowest-score) strategies. Cluster feed streaming is refactored with control-flow resilience and round-trip timing. Matcher logic now performs accounting prechecks and constructs placement contexts for strategy-aware host selection.

Changes

Scheduler-managed accounting and placement integration

Layer / File(s)	Summary
Configuration and booking strategy contracts `rust/crates/scheduler/src/config/mod.rs`, `rust/config/scheduler.yaml`, `rust/crates/scheduler/Cargo.toml`	New `AccountingConfig` and `RedisConfig` with cadence/TTL fields; `QueueConfig` adds `cluster_empty_sleep`, dispatch caps, and memory reservation knobs; `HostBookingStrategy` refactored into tagged enum with `Saturation` and `Epvm` variants; `ScoreWeights` struct introduced for E-PVM scoring parameters; YAML docs expanded to describe booking strategies and configuration defaults.
Facility/tag/layer data shape and DAO transitions `rust/crates/scheduler/src/cluster.rs`, `rust/crates/scheduler/src/cluster_key.rs`, `rust/crates/scheduler/src/models/layer.rs`, `rust/crates/scheduler/src/dao/cluster_dao.rs`, `rust/crates/scheduler/src/dao/layer_dao.rs`	`Cluster.facility_id`, `Tag`, and `ClusterKey` switch to string-based facility IDs; `Tag` adds optional `alloc_id` field; `DispatchLayer` adds `folder_id`, `dept_id`, and `job_max_cores` fields; cluster loading logic updates tag aggregation keying and `alloc_id` population for ALLOC-type tags; DAO queries select/project facility and job-cap fields; `get_facility_id` returns `String` directly from DB instead of parsing `Uuid`.
Redis accounting client and Lua result parsing `rust/crates/scheduler/src/accounting/redis_client.rs`	New async Redis client `RedisAccounting` with `ReseedOp`, `BookOutcome`, and `BookMode` types; implements `connect`, `book`, `reseed_cas`, `get_seq`, `read_sub_counters`, `read_job_cores_in_use` operations; strict `parse_book_outcome` validator ensures Lua BOOK_OR_FORCE responses match expected shapes (Applied or LimitExceeded).
Placement profile, scoring, and gates `rust/crates/scheduler/src/pipeline/placement.rs`, `rust/crates/scheduler/src/pipeline/mod.rs`	New module defining `LayerProfile` (layer snapshot with floor/compatibility and E-PVM context); host-matching helpers for OS and thread-mode validation; `fits_floor` predicate checking idle resources; `compute_max_more` headroom computation; `placement_score` using W3 normalization with configurable weights; `saturation_gate` (first-fit validate-only) and `epvm_gate` (validate + score); comprehensive unit tests and `proptest` properties for scoring monotonicity and non-negativity.
Host cache checkout API and strategy execution `rust/crates/scheduler/src/host_cache/cache.rs`, `rust/crates/scheduler/src/host_cache/actor.rs`, `rust/crates/scheduler/src/host_cache/messages.rs`, `rust/crates/scheduler/src/host_cache/mod.rs`	`CheckOut` message becomes non-generic, carrying `profile` and `gate` instead of generic validation closure; facility ID becomes `String`; `HostCache::check_out` dispatches to Saturation (first-fit) or Epvm (K-best candidate scoring) strategies; new `ScoredCandidate` struct and `remove_host_best` implementation for E-PVM; actor adds `refresh_in_progress` RAII guard for concurrency control; `CacheRatioResponse` simplified to `hit_ratio` only; reservation-aware checkout with expired reservation treatment; test suite updated with profile/gate helpers and new Epvm-specific tests.
Cluster feed and matcher runtime orchestration `rust/crates/scheduler/src/cluster.rs`, `rust/crates/scheduler/src/pipeline/entrypoint.rs`, `rust/crates/scheduler/src/pipeline/matcher.rs`, `rust/crates/scheduler/src/metrics/mod.rs`	Cluster feed producer/receiver wrapped with `AssertUnwindSafe`/`catch_unwind` and `ControlFlow` for panic resilience; `last_sent_map` tracks round-trip timestamps with `observe_cluster_round_trip` metrics; receiver processes `FeedMessage::Sleep` and `FeedMessage::Stop` cleanly; entrypoint initializes accounting service and bootstrap reseed; matcher adds `feed_sender` parameter for cluster sleep signaling; `process_layer` performs Redis snapshot reads (job cores, subscription counters), derives allocation context, computes burst/cores-in-use, executes pre-checkout burst checks; host checkout constructs `LayerProfile`, selects strategy gate, tracks frame consumption and updates booked counters; new metrics for accounting-limit rejections, cluster round-trip, placement score, and E-PVM inner-retries.
Supporting scheduler tests and Cuebot doc note `rust/crates/scheduler/tests/util.rs`, `cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatchQuery.java`	Test utilities updated: `create_test_config` adds `cluster_empty_sleep` and `AccountingConfig::default()`; `create_test_data` passes facility ID as string and explicitly populates tag `alloc_id` fields (None for manual/hostname/hardware, Some for allocation tags); Cuebot `DispatchQuery.FIND_SHOWS` javadoc clarifies that scheduler-managed shows are excluded because dispatch is owned by standalone Rust scheduler.

Sequence Diagram(s)

sequenceDiagram
  participant Matcher as Matcher Layer<br/>Processing
  participant Redis as Redis<br/>Accounting
  participant HostCache as Host Cache<br/>Actor
  participant HostDB as Host DB<br/>Lookup
  
  Matcher->>Redis: read job cores_in_use
  Redis-->>Matcher: job_cores_in_use (or 0)
  Matcher->>Redis: read show/alloc subscription
  Redis-->>Matcher: (int_cores, burst)
  Matcher->>Matcher: compute profile (burst,<br/>cores_in_use, weights)
  Matcher->>Matcher: select Saturation<br/>or Epvm strategy
  alt pre-checkout burst limit exceeded
    Matcher->>Matcher: emit FeedMessage::Sleep
    Matcher->>Redis: increment<br/>accounting_limit_exceeded
    Note over Matcher: Early return, no host checkout
  else proceed to checkout
    Matcher->>HostCache: send CheckOut<br/>(profile, gate, facility_id)
    HostCache->>HostDB: fetch candidate hosts<br/>by show/facility/tag
    HostDB-->>HostCache: candidate list
    alt Saturation strategy
      HostCache->>HostCache: first valid host<br/>gate(host, profile)=Some()
    else Epvm strategy
      HostCache->>HostCache: K-best by lowest score<br/>min(gate(host, profile))
      HostCache->>HostCache: CAS removal with retries
      Note over HostCache: Record inner-retries metric
    end
    HostCache-->>Matcher: Host or error
    Matcher->>Matcher: dispatch frames,<br/>track consumption
    Matcher->>Matcher: update booked counters<br/>for next iteration
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

[rust-scheduler] Performance and Architectural Optimizations for Cue-Scheduler #2318: This PR implements the host_cache refactor (checkout algorithm, reservation handling, actor behavior, and metrics) described in the performance/architecture proposal.

Possibly related PRs

AcademySoftwareFoundation/OpenCue#2323: Both PRs touch cuebot/.../dao/postgres/DispatchQuery.FIND_SHOWS to ensure scheduler-managed shows are excluded from Cuebot booking/dispatch.
AcademySoftwareFoundation/OpenCue#2198: This PR implements scheduler-side (Redis-backed) resource accounting and scheduler-managed-show dispatch, continuing the shift away from Cuebot's per-dispatch resource table updates.
AcademySoftwareFoundation/OpenCue#2328: Both PRs change scheduler layer-to-host matching logic; this PR introduces placement primitives including host_matches_thread_mode/validate_os_and_thread_mode.

Suggested reviewers

lithorus
ramonfigueiredo

Poem

🐰 A leap through Redis clouds we go,
Where placement scores and strategies flow,
Saturation first, E-PVM best,
The scheduler's booking put to the test!
Facility strings and profiles bright,
Host cache now chooses with weighted might.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[scheduler] Add scheduling mode: E-PVM' directly and clearly summarizes the main change. The PR adds a new E-PVM scheduling mode to the scheduler, which is the primary objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (4)

rust/crates/scheduler/src/config/mod.rs (1)

101-105: 💤 Low value

Consider adding authentication support to Redis URL construction.

The url() method constructs a basic redis://host:port/ URL without authentication. If production deployments require Redis AUTH, this method may need enhancement to support username/password or connection string overrides.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/scheduler/src/config/mod.rs` around lines 101 - 105, The
RedisConfig::url() method currently returns "redis://host:port/" with no auth
support; update RedisConfig (struct) to include optional credentials (e.g.,
username: Option<String>, password: Option<String>) or an optional
raw_connection_string: Option<String>, and modify RedisConfig::url() to: if
raw_connection_string is Some return it; otherwise build the URL including
"redis://username:password@host:port/" when credentials are present (omit
username if only password needed per Redis URI rules) and percent-encode
credentials as necessary to avoid malformed URLs. Ensure you reference the
RedisConfig struct and the url() method so callers keep backwards compatibility
by keeping fields optional.

rust/crates/scheduler/src/pipeline/matcher.rs (1)

306-312: ⚡ Quick win

Handle FeedMessage::Sleep send failures explicitly.

Ignoring the send result hides closed-channel failures; then this over-burst path returns without applying any backoff locally.

Suggested patch

-            let _ = feed_sender
-                .send(FeedMessage::Sleep(
-                    (*cluster).clone(),
-                    CONFIG.queue.cluster_empty_sleep,
-                ))
-                .await;
+            if let Err(err) = feed_sender
+                .send(FeedMessage::Sleep(
+                    (*cluster).clone(),
+                    CONFIG.queue.cluster_empty_sleep,
+                ))
+                .await
+            {
+                debug!(
+                    "Failed to enqueue cluster sleep for {} (over-burst path): {}",
+                    cluster, err
+                );
+                tokio::time::sleep(CONFIG.queue.cluster_empty_sleep).await;
+            }
             return;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/scheduler/src/pipeline/matcher.rs` around lines 306 - 312, The
code currently ignores the result of
feed_sender.send(FeedMessage::Sleep((*cluster).clone(),
CONFIG.queue.cluster_empty_sleep)).await; change this to handle the Result
explicitly: if send returns Ok proceed as before, but if Err (channel closed)
log the failure (include context like cluster and that FeedMessage::Sleep
failed) and apply a local backoff before returning (e.g., await
tokio::time::sleep(Duration::from_millis(CONFIG.queue.cluster_empty_sleep)) or
equivalent) so the over-burst path does not immediately return without backoff;
update matcher.rs around the feed_sender send call to match on the Result and
handle Err accordingly.

rust/crates/scheduler/src/dao/resource_accounting_dao.rs (1)

149-152: 💤 Low value

Redundant p.pk_show = ANY($1) filter in point recompute query.

Line 151 applies p.pk_show = ANY($1) in the LEFT JOIN condition, but p.pk_show = pt2.pk_show is already constrained by the join, and pt2.pk_show = ANY($1) at line 152 already filters the point rows. The filter at line 151 is redundant.

Not a correctness issue, but removing it would simplify the query slightly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/scheduler/src/dao/resource_accounting_dao.rs` around lines 149 -
152, Remove the redundant p.pk_show = ANY($1) predicate from the LEFT JOIN on
proc (alias p) in the point recompute SQL: the join already enforces p.pk_show =
pt2.pk_show and the WHERE clause pt2.pk_show = ANY($1) filters pt2, so delete
the p.pk_show = ANY($1) condition in the LEFT JOIN to simplify the query; update
the SQL string in resource_accounting_dao.rs where the LEFT JOIN proc p ... AND
p.pk_show = ANY($1) appears and ensure spacing/AND chaining remains valid after
removal.

pycue/tests/wrappers/test_show.py (1)

191-201: ⚡ Quick win

Cover both toggle directions in the RPC assertion.

testSetSchedulerManaged currently validates only enabled=True. Adding enabled=False in the same test tightens contract coverage for the new toggle RPC with minimal effort.

♻️ Suggested test update

     def testSetSchedulerManaged(self, getStubMock):
         stubMock = mock.Mock()
         stubMock.SetSchedulerManaged.return_value = show_pb2.ShowSetSchedulerManagedResponse()
         getStubMock.return_value = stubMock
 
         show = opencue.wrappers.show.Show(show_pb2.Show(name=TEST_SHOW_NAME))
-        show.setSchedulerManaged(True)
-
-        stubMock.SetSchedulerManaged.assert_called_with(
-            show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True),
-            timeout=mock.ANY)
+        show.setSchedulerManaged(True)
+        show.setSchedulerManaged(False)
+
+        stubMock.SetSchedulerManaged.assert_any_call(
+            show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True),
+            timeout=mock.ANY)
+        stubMock.SetSchedulerManaged.assert_any_call(
+            show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=False),
+            timeout=mock.ANY)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pycue/tests/wrappers/test_show.py` around lines 191 - 201, The test only
asserts the RPC for enabling the scheduler; update testSetSchedulerManaged to
exercise both toggle directions by invoking show.setSchedulerManaged(True) and
show.setSchedulerManaged(False) (or call once and then again) and assert the
stubMock.SetSchedulerManaged was called for both cases using either two
assert_called_with checks referencing
show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True) and
...enabled=False) or a single assert_has_calls with both expected requests;
target symbols: testSetSchedulerManaged, show.setSchedulerManaged,
stubMock.SetSchedulerManaged, and ShowSetSchedulerManagedRequest.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cuebot/build.gradle`:
- Around line 41-44: Update the Lettuce dependency to a non-vulnerable release
by changing the dependency declaration for io.lettuce:lettuce-core (currently
'6.3.2.RELEASE') to at least '6.5.1.RELEASE' (preferably the latest, e.g.
'6.7.1.RELEASE'); keep the explicit reactor-core override for
io.projectreactor:reactor-core at the 3.6.x line (e.g. '3.6.5') so Lettuce
compatibility is preserved and remove/replace the old version reference wherever
io.lettuce:lettuce-core '6.3.2.RELEASE' is declared.

In
`@cuebot/src/main/resources/conf/ddl/postgres/migrations/V44__scheduler_pending_query_indexes.sql`:
- Line 20: The manual backfill example in
V44__scheduler_pending_query_indexes.sql incorrectly inserts version 40 into
flyway_schema_history; update the VALUES tuple to use 44 to match this migration
(ensure the INSERT INTO flyway_schema_history(... ) VALUES (..., 44, ... )
reflects the V44__scheduler_pending_query_indexes.sql migration) so the
flyway_schema_history row matches the migration version and prevents poisoning
subsequent runs.

In `@docs/_docs/developer-guide/redis-accounting.md`:
- Around line 247-253: The two unlabeled fenced code blocks that contain the
numbered acct booking steps and the cueadmin command examples should be given
explicit languages to satisfy markdownlint MD040: change the fence wrapping the
numbered list (the block starting "1. Read current state of acct:sub /
acct:folder / acct:job / acct:layer / acct:point") to use a text label (e.g.,
```text) and change the fence wrapping the cueadmin examples (the block with
"cueadmin -show <name> -setSchedulerManaged true/false") to use a shell label
(e.g., ```bash); update both occurrences (the one around the acct steps and the
one around the cueadmin commands) accordingly.
- Line 14: The heading "How Cuebot and the Rust scheduler coordinate per-show
resource accounting through Redis" is using ### causing a level jump from # to
### and failing MD001; change that heading token from "###" to "##" so it
follows the top-level "#" and fixes the markdown lint error (update the heading
line where that exact text appears).

In `@docs/_docs/developer-guide/scheduler.md`:
- Around line 648-654: The fenced code block listing the scheduler steps
(starting "1. Read current state of acct:sub / acct:folder / acct:job /
acct:layer / acct:point" through "5. Then transactionally INSERT proc in
Postgres (outside Lua)") is unlabeled and triggers MD040; add a language label
to the opening triple-backtick (e.g., ```text) to satisfy the linter and
preserve formatting so the block is treated as a code/preformatted block.

In `@docs/_docs/getting-started/deploying-scheduler.md`:
- Around line 262-265: The docs contain inconsistent cueadmin usage: keep the
correct form "cueadmin -scheduler-managed <SHOW> ON|OFF" and replace any
incorrect examples using "-show <name> -setSchedulerManaged true|false" (look
for occurrences of the flags -setSchedulerManaged and -show in docs, e.g.,
developer-guide scheduler and redis-accounting sections); update those examples
to use the exact flag and value format "-scheduler-managed <SHOW> ON" or "OFF"
so all docs match the working CLI syntax.

In `@pycue/opencue/wrappers/show.py`:
- Around line 223-235: setSchedulerManaged currently calls
stub.SetSchedulerManaged but doesn't update the wrapper's local state, so
subsequent schedulerManaged() reads can be stale; after calling
SetSchedulerManaged (in setSchedulerManaged), assign the returned value to
self.data.scheduler_managed (or set self.data.scheduler_managed = value if the
RPC returns no body) using the response from stub.SetSchedulerManaged to
synchronize the wrapper, ensuring schedulerManaged() reflects the new state.

In `@rust/crates/scheduler/src/accounting/dao.rs`:
- Around line 146-157: The query_managed_show_ids function currently maps DB
rows using parse_uuid which unwraps and will panic on malformed UUIDs; change
the handling to use a fallible parse instead: update parse_uuid (in
dao/helpers.rs) to return Result<Uuid, Report> or provide a new try_parse_uuid
that calls Uuid::parse_str and returns an Err on failure, then modify
query_managed_show_ids (and other callers) to collect a Result<Vec<Uuid>, _> or
filter/log and skip invalid rows rather than calling unwrap—propagate the error
with wrap_err for diagnostics or log and skip malformed rows so a single corrupt
DB value cannot crash the scheduler.

In `@rust/crates/scheduler/src/accounting/recompute.rs`:
- Around line 46-52: The current recompute task exits permanently if
ResourceAccountingDao::new().await returns Err, causing a long-lived outage;
change the initialization to retry instead of returning: wrap the call to
ResourceAccountingDao::new().await (the code creating pg_dao/Arc::new(d)) in a
retry loop with exponential backoff (or fixed delay) and error logging, continue
retrying until success, and only proceed to create pg_dao and enter the
recompute loop after a successful Result; ensure any temporary failures do not
cause an early return from the recompute task.

In `@rust/crates/scheduler/src/accounting/redis_client.rs`:
- Around line 142-144: Update the unit wording in the Redis client docs: change
"centicores" to "cores" in the doc comment that begins "Reads the subscription
hash's booked cores + burst in one round-trip..." and the other similar doc
comment around the subsequent block (the occurrences currently at the same doc
block and the one at lines ~161-162). Ensure both doc comments and any inline
mention in redis_client.rs consistently state that Redis counters operate in
cores (not centicores) so the Lua contract and callers are not misled.
- Line 155: The call currently uses conn.hget(&key, &["int_cores",
"burst"]).await? which is a single-field read API but is passing multiple
fields; change this to the multi-field API conn.hmget(&key, &["int_cores",
"burst"]).await? (keeping the expected type Vec<Option<i64>> and the surrounding
error handling) so the Redis client sends HMGET for the fields "int_cores" and
"burst" and maps into values as before.

In `@rust/crates/scheduler/src/cluster.rs`:
- Around line 418-423: The code can panic when current_index_atomic contains an
out-of-range value after self.clusters shrinks; before indexing clusters use the
current clusters length to clamp the index (and handle empty clusters).
Specifically, in the block around current_index_atomic.load, get let len =
clusters.len(); if len == 0 return/skip; compute a safe_index = current_index %
len (or safe_index = min(current_index, len-1)), then read item =
clusters[safe_index].clone(); compute next_index = (safe_index + 1) % len and
store that back into current_index_atomic; this prevents indexing with stale,
too-large indices while preserving wrap-around detection and forward progress.

In `@rust/crates/scheduler/src/models/core_size.rs`:
- Around line 68-71: The CoreSize conversion functions
(CoreSize::from_multiplied and CoreSize::from_multiplied_cap) perform unchecked
i64 -> i32 narrowing using `as i32`, which can silently wrap; change them to use
a checked conversion (e.g., try_into / i32::try_from) or explicit saturation
with bounds checking against i32::MIN/i32::MAX and handle failures by returning
a Result or clamping to safe limits. Update the functions that reference
CONFIG.queue.core_multiplier to compute the intermediate i64, check for
overflow/underflow before converting, and either return an error (propagate
Result) or clamp to i32 bounds and document the behavior so no silent wrapping
occurs.

In `@rust/crates/scheduler/src/pipeline/dispatcher/actor.rs`:
- Around line 482-491: BookingDelta is being built with DispatchLayer.folder_id
and dept_id that may be Uuid::nil(), which would produce incorrect
acct:folder:{folder_id} and acct:point:{dept_id}:{show_id} Redis keys; add an
explicit nil-UUID guard before constructing BookingDelta (best place: inside
dispatch_virtual_proc right before creating BookingDelta or at the callsite in
dispatch_inner/consume_host_virtual_resources) that checks
DispatchLayer.folder_id.is_nil() and DispatchLayer.dept_id.is_nil() and
rejects/returns an Err (or skips booking) with a logged error; ensure the check
references the DispatchLayer fields and prevents creating BookingDelta when
either UUID is nil so no acct:folder or acct:point keys are written for the zero
UUID.

---

Nitpick comments:
In `@pycue/tests/wrappers/test_show.py`:
- Around line 191-201: The test only asserts the RPC for enabling the scheduler;
update testSetSchedulerManaged to exercise both toggle directions by invoking
show.setSchedulerManaged(True) and show.setSchedulerManaged(False) (or call once
and then again) and assert the stubMock.SetSchedulerManaged was called for both
cases using either two assert_called_with checks referencing
show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True) and
...enabled=False) or a single assert_has_calls with both expected requests;
target symbols: testSetSchedulerManaged, show.setSchedulerManaged,
stubMock.SetSchedulerManaged, and ShowSetSchedulerManagedRequest.

In `@rust/crates/scheduler/src/config/mod.rs`:
- Around line 101-105: The RedisConfig::url() method currently returns
"redis://host:port/" with no auth support; update RedisConfig (struct) to
include optional credentials (e.g., username: Option<String>, password:
Option<String>) or an optional raw_connection_string: Option<String>, and modify
RedisConfig::url() to: if raw_connection_string is Some return it; otherwise
build the URL including "redis://username:password@host:port/" when credentials
are present (omit username if only password needed per Redis URI rules) and
percent-encode credentials as necessary to avoid malformed URLs. Ensure you
reference the RedisConfig struct and the url() method so callers keep backwards
compatibility by keeping fields optional.

In `@rust/crates/scheduler/src/dao/resource_accounting_dao.rs`:
- Around line 149-152: Remove the redundant p.pk_show = ANY($1) predicate from
the LEFT JOIN on proc (alias p) in the point recompute SQL: the join already
enforces p.pk_show = pt2.pk_show and the WHERE clause pt2.pk_show = ANY($1)
filters pt2, so delete the p.pk_show = ANY($1) condition in the LEFT JOIN to
simplify the query; update the SQL string in resource_accounting_dao.rs where
the LEFT JOIN proc p ... AND p.pk_show = ANY($1) appears and ensure spacing/AND
chaining remains valid after removal.

In `@rust/crates/scheduler/src/pipeline/matcher.rs`:
- Around line 306-312: The code currently ignores the result of
feed_sender.send(FeedMessage::Sleep((*cluster).clone(),
CONFIG.queue.cluster_empty_sleep)).await; change this to handle the Result
explicitly: if send returns Ok proceed as before, but if Err (channel closed)
log the failure (include context like cluster and that FeedMessage::Sleep
failed) and apply a local backoff before returning (e.g., await
tokio::time::sleep(Duration::from_millis(CONFIG.queue.cluster_empty_sleep)) or
equivalent) so the over-burst path does not immediately return without backoff;
update matcher.rs around the feed_sender send call to match on the Result and
handle Err accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a7aa165-b9e9-4987-a82f-97ec0dd9c477

📥 Commits

Reviewing files that changed from the base of the PR and between e504bc9 and eac7ebf.

📒 Files selected for processing (82)

.gitignore
VERSION.in
cueadmin/cueadmin/common.py
cueadmin/tests/test_common.py
cuebot/build.gradle
cuebot/src/main/java/com/imageworks/spcue/ShowEntity.java
cuebot/src/main/java/com/imageworks/spcue/VirtualProc.java
cuebot/src/main/java/com/imageworks/spcue/config/AppConfig.java
cuebot/src/main/java/com/imageworks/spcue/dao/ShowDao.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatchQuery.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatcherDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/ProcDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/ShowDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/WhiteboardDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/servant/ManageShow.java
cuebot/src/main/java/com/imageworks/spcue/service/AccountingRedisPublisher.java
cuebot/src/main/java/com/imageworks/spcue/service/LettuceAccountingRedisPublisher.java
cuebot/src/main/resources/conf/ddl/postgres/migrations/V44__scheduler_pending_query_indexes.sql
cuebot/src/main/resources/conf/ddl/postgres/migrations/V45__show_scheduler_managed.sql
cuebot/src/main/resources/conf/spring/applicationContext-accounting.xml
cuebot/src/main/resources/opencue.properties
cuebot/src/test/java/com/imageworks/spcue/config/TestAppConfig.java
cuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/DispatcherDaoTests.java
cuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/ProcDaoTests.java
cuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/ShowDaoTests.java
cuebot/src/test/java/com/imageworks/spcue/test/servant/ManageShowTests.java
cuebot/src/test/java/com/imageworks/spcue/test/service/LettuceAccountingRedisPublisherTests.java
docs/_docs/developer-guide/redis-accounting.md
docs/_docs/developer-guide/scheduler.md
docs/_docs/getting-started/deploying-scheduler.md
docs/news/2025-12-12-distributed-scheduler-release.md
proto/src/show.proto
pycue/opencue/wrappers/show.py
pycue/tests/wrappers/test_show.py
rust/config/scheduler.yaml
rust/crates/scheduler/Cargo.toml
rust/crates/scheduler/src/accounting/booking_delta.rs
rust/crates/scheduler/src/accounting/bootstrap.rs
rust/crates/scheduler/src/accounting/dao.rs
rust/crates/scheduler/src/accounting/error.rs
rust/crates/scheduler/src/accounting/limit_reseed.rs
rust/crates/scheduler/src/accounting/lua.rs
rust/crates/scheduler/src/accounting/managed_shows.rs
rust/crates/scheduler/src/accounting/mod.rs
rust/crates/scheduler/src/accounting/recompute.rs
rust/crates/scheduler/src/accounting/redis_client.rs
rust/crates/scheduler/src/cluster.rs
rust/crates/scheduler/src/cluster_key.rs
rust/crates/scheduler/src/config/mod.rs
rust/crates/scheduler/src/dao/cluster_dao.rs
rust/crates/scheduler/src/dao/frame_dao.rs
rust/crates/scheduler/src/dao/host_dao.rs
rust/crates/scheduler/src/dao/job_dao.rs
rust/crates/scheduler/src/dao/layer_dao.rs
rust/crates/scheduler/src/dao/mod.rs
rust/crates/scheduler/src/dao/proc_dao.rs
rust/crates/scheduler/src/dao/resource_accounting_dao.rs
rust/crates/scheduler/src/host_cache/actor.rs
rust/crates/scheduler/src/host_cache/cache.rs
rust/crates/scheduler/src/host_cache/messages.rs
rust/crates/scheduler/src/host_cache/mod.rs
rust/crates/scheduler/src/lib.rs
rust/crates/scheduler/src/main.rs
rust/crates/scheduler/src/metrics/mod.rs
rust/crates/scheduler/src/models/core_size.rs
rust/crates/scheduler/src/models/frame.rs
rust/crates/scheduler/src/models/layer.rs
rust/crates/scheduler/src/models/mod.rs
rust/crates/scheduler/src/models/subscription.rs
rust/crates/scheduler/src/models/virtual_proc.rs
rust/crates/scheduler/src/pipeline/dispatcher/actor.rs
rust/crates/scheduler/src/pipeline/dispatcher/error.rs
rust/crates/scheduler/src/pipeline/dispatcher/mod.rs
rust/crates/scheduler/src/pipeline/entrypoint.rs
rust/crates/scheduler/src/pipeline/layer_permit.rs
rust/crates/scheduler/src/pipeline/matcher.rs
rust/crates/scheduler/src/pipeline/mod.rs
rust/crates/scheduler/src/pipeline/placement.rs
rust/crates/scheduler/src/resource_accounting.rs
rust/crates/scheduler/tests/redis_integration.rs
rust/crates/scheduler/tests/smoke_tests.rs
rust/crates/scheduler/tests/util.rs

💤 Files with no reviewable changes (5)

rust/crates/scheduler/src/resource_accounting.rs
rust/crates/scheduler/src/models/mod.rs
rust/crates/scheduler/src/models/subscription.rs
rust/crates/scheduler/src/dao/mod.rs
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatcherDaoJdbc.java

coderabbitai · 2026-06-09T00:21:23Z

+    implementation group: 'io.lettuce', name: 'lettuce-core', version: '6.3.2.RELEASE'
+    // Lettuce 6.3.x requires reactor-core 3.6.x (for reactor.core.publisher.Sinks).
+    // Spring Boot 2.2.1's dependency management would otherwise pin this to 3.3.x.
+    implementation group: 'io.projectreactor', name: 'reactor-core', version: '3.6.5'


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for security advisories on the added dependencies echo "=== Checking Lettuce security advisories ===" gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: MAVEN, package: "io.lettuce:lettuce-core") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }' echo "" echo "=== Checking Reactor Core security advisories ===" gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: MAVEN, package: "io.projectreactor:reactor-core") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }' echo "" echo "=== Checking latest Maven Central versions ===" curl -s "https://search.maven.org/solrsearch/select?q=g:io.lettuce+AND+a:lettuce-core&rows=1&wt=json" | jq '.response.docs[0].latestVersion' curl -s "https://search.maven.org/solrsearch/select?q=g:io.projectreactor+AND+a:reactor-core&rows=1&wt=json" | jq '.response.docs[0].latestVersion'

Repository: AcademySoftwareFoundation/OpenCue

Length of output: 593

Update Lettuce dependency due to known vulnerability

io.lettuce:lettuce-core:6.3.2.RELEASE falls under a known MODERATE advisory range (< 6.5.1.RELEASE: “Withdrawn Advisory: Netty vulnerability included in redis lettuce”); upgrade to io.lettuce:lettuce-core:6.5.1.RELEASE or newer (latest on Maven Central: 6.7.1.RELEASE) and keep reactor-core on the required 3.6.x line for Lettuce compatibility. io.projectreactor:reactor-core:3.6.5 did not return any advisories in the checked results.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cuebot/build.gradle` around lines 41 - 44, Update the Lettuce dependency to a non-vulnerable release by changing the dependency declaration for io.lettuce:lettuce-core (currently '6.3.2.RELEASE') to at least '6.5.1.RELEASE' (preferably the latest, e.g. '6.7.1.RELEASE'); keep the explicit reactor-core override for io.projectreactor:reactor-core at the 3.6.x line (e.g. '3.6.5') so Lettuce compatibility is preserved and remove/replace the old version reference wherever io.lettuce:lettuce-core '6.3.2.RELEASE' is declared.

Add an opt-in `Epvm` host-booking strategy alongside the existing `Saturation` strategy (which stays the default). When `Epvm` is enabled, the host cache scores candidate hosts using E-PVM stranding and picks the lowest-scoring host within a configurable scan cap, replacing today's first-fit-by-saturation iteration for the same call.

DiegoTavares · 2026-06-09T22:35:55Z

Actionable comments posted: 14

🧹 Nitpick comments (4)

rust/crates/scheduler/src/config/mod.rs (1)> 101-105: 💤 Low value

Consider adding authentication support to Redis URL construction.
The url() method constructs a basic redis://host:port/ URL without authentication. If production deployments require Redis AUTH, this method may need enhancement to support username/password or connection string overrides.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/scheduler/src/config/mod.rs` around lines 101 - 105, The
RedisConfig::url() method currently returns "redis://host:port/" with no auth
support; update RedisConfig (struct) to include optional credentials (e.g.,
username: Option<String>, password: Option<String>) or an optional
raw_connection_string: Option<String>, and modify RedisConfig::url() to: if
raw_connection_string is Some return it; otherwise build the URL including
"redis://username:password@host:port/" when credentials are present (omit
username if only password needed per Redis URI rules) and percent-encode
credentials as necessary to avoid malformed URLs. Ensure you reference the
RedisConfig struct and the url() method so callers keep backwards compatibility
by keeping fields optional.

rust/crates/scheduler/src/pipeline/matcher.rs (1)> 306-312: ⚡ Quick win

Handle FeedMessage::Sleep send failures explicitly.
Ignoring the send result hides closed-channel failures; then this over-burst path returns without applying any backoff locally.

Suggested patch

-            let _ = feed_sender
-                .send(FeedMessage::Sleep(
-                    (*cluster).clone(),
-                    CONFIG.queue.cluster_empty_sleep,
-                ))
-                .await;
+            if let Err(err) = feed_sender
+                .send(FeedMessage::Sleep(
+                    (*cluster).clone(),
+                    CONFIG.queue.cluster_empty_sleep,
+                ))
+                .await
+            {
+                debug!(
+                    "Failed to enqueue cluster sleep for {} (over-burst path): {}",
+                    cluster, err
+                );
+                tokio::time::sleep(CONFIG.queue.cluster_empty_sleep).await;
+            }
             return;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/scheduler/src/pipeline/matcher.rs` around lines 306 - 312, The
code currently ignores the result of
feed_sender.send(FeedMessage::Sleep((*cluster).clone(),
CONFIG.queue.cluster_empty_sleep)).await; change this to handle the Result
explicitly: if send returns Ok proceed as before, but if Err (channel closed)
log the failure (include context like cluster and that FeedMessage::Sleep
failed) and apply a local backoff before returning (e.g., await
tokio::time::sleep(Duration::from_millis(CONFIG.queue.cluster_empty_sleep)) or
equivalent) so the over-burst path does not immediately return without backoff;
update matcher.rs around the feed_sender send call to match on the Result and
handle Err accordingly.

rust/crates/scheduler/src/dao/resource_accounting_dao.rs (1)> 149-152: 💤 Low value

Redundant p.pk_show = ANY($1) filter in point recompute query.
Line 151 applies p.pk_show = ANY($1) in the LEFT JOIN condition, but p.pk_show = pt2.pk_show is already constrained by the join, and pt2.pk_show = ANY($1) at line 152 already filters the point rows. The filter at line 151 is redundant.
Not a correctness issue, but removing it would simplify the query slightly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/scheduler/src/dao/resource_accounting_dao.rs` around lines 149 -
152, Remove the redundant p.pk_show = ANY($1) predicate from the LEFT JOIN on
proc (alias p) in the point recompute SQL: the join already enforces p.pk_show =
pt2.pk_show and the WHERE clause pt2.pk_show = ANY($1) filters pt2, so delete
the p.pk_show = ANY($1) condition in the LEFT JOIN to simplify the query; update
the SQL string in resource_accounting_dao.rs where the LEFT JOIN proc p ... AND
p.pk_show = ANY($1) appears and ensure spacing/AND chaining remains valid after
removal.

pycue/tests/wrappers/test_show.py (1)> 191-201: ⚡ Quick win

Cover both toggle directions in the RPC assertion.
testSetSchedulerManaged currently validates only enabled=True. Adding enabled=False in the same test tightens contract coverage for the new toggle RPC with minimal effort.

♻️ Suggested test update

     def testSetSchedulerManaged(self, getStubMock):
         stubMock = mock.Mock()
         stubMock.SetSchedulerManaged.return_value = show_pb2.ShowSetSchedulerManagedResponse()
         getStubMock.return_value = stubMock
 
         show = opencue.wrappers.show.Show(show_pb2.Show(name=TEST_SHOW_NAME))
-        show.setSchedulerManaged(True)
-
-        stubMock.SetSchedulerManaged.assert_called_with(
-            show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True),
-            timeout=mock.ANY)
+        show.setSchedulerManaged(True)
+        show.setSchedulerManaged(False)
+
+        stubMock.SetSchedulerManaged.assert_any_call(
+            show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True),
+            timeout=mock.ANY)
+        stubMock.SetSchedulerManaged.assert_any_call(
+            show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=False),
+            timeout=mock.ANY)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pycue/tests/wrappers/test_show.py` around lines 191 - 201, The test only
asserts the RPC for enabling the scheduler; update testSetSchedulerManaged to
exercise both toggle directions by invoking show.setSchedulerManaged(True) and
show.setSchedulerManaged(False) (or call once and then again) and assert the
stubMock.SetSchedulerManaged was called for both cases using either two
assert_called_with checks referencing
show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True) and
...enabled=False) or a single assert_has_calls with both expected requests;
target symbols: testSetSchedulerManaged, show.setSchedulerManaged,
stubMock.SetSchedulerManaged, and ShowSetSchedulerManagedRequest.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cuebot/build.gradle`:
- Around line 41-44: Update the Lettuce dependency to a non-vulnerable release
by changing the dependency declaration for io.lettuce:lettuce-core (currently
'6.3.2.RELEASE') to at least '6.5.1.RELEASE' (preferably the latest, e.g.
'6.7.1.RELEASE'); keep the explicit reactor-core override for
io.projectreactor:reactor-core at the 3.6.x line (e.g. '3.6.5') so Lettuce
compatibility is preserved and remove/replace the old version reference wherever
io.lettuce:lettuce-core '6.3.2.RELEASE' is declared.

In
`@cuebot/src/main/resources/conf/ddl/postgres/migrations/V44__scheduler_pending_query_indexes.sql`:
- Line 20: The manual backfill example in
V44__scheduler_pending_query_indexes.sql incorrectly inserts version 40 into
flyway_schema_history; update the VALUES tuple to use 44 to match this migration
(ensure the INSERT INTO flyway_schema_history(... ) VALUES (..., 44, ... )
reflects the V44__scheduler_pending_query_indexes.sql migration) so the
flyway_schema_history row matches the migration version and prevents poisoning
subsequent runs.

In `@docs/_docs/developer-guide/redis-accounting.md`:
- Around line 247-253: The two unlabeled fenced code blocks that contain the
numbered acct booking steps and the cueadmin command examples should be given
explicit languages to satisfy markdownlint MD040: change the fence wrapping the
numbered list (the block starting "1. Read current state of acct:sub /
acct:folder / acct:job / acct:layer / acct:point") to use a text label (e.g.,
```text) and change the fence wrapping the cueadmin examples (the block with
"cueadmin -show <name> -setSchedulerManaged true/false") to use a shell label
(e.g., ```bash); update both occurrences (the one around the acct steps and the
one around the cueadmin commands) accordingly.
- Line 14: The heading "How Cuebot and the Rust scheduler coordinate per-show
resource accounting through Redis" is using ### causing a level jump from # to
### and failing MD001; change that heading token from "###" to "##" so it
follows the top-level "#" and fixes the markdown lint error (update the heading
line where that exact text appears).

In `@docs/_docs/developer-guide/scheduler.md`:
- Around line 648-654: The fenced code block listing the scheduler steps
(starting "1. Read current state of acct:sub / acct:folder / acct:job /
acct:layer / acct:point" through "5. Then transactionally INSERT proc in
Postgres (outside Lua)") is unlabeled and triggers MD040; add a language label
to the opening triple-backtick (e.g., ```text) to satisfy the linter and
preserve formatting so the block is treated as a code/preformatted block.

In `@docs/_docs/getting-started/deploying-scheduler.md`:
- Around line 262-265: The docs contain inconsistent cueadmin usage: keep the
correct form "cueadmin -scheduler-managed <SHOW> ON|OFF" and replace any
incorrect examples using "-show <name> -setSchedulerManaged true|false" (look
for occurrences of the flags -setSchedulerManaged and -show in docs, e.g.,
developer-guide scheduler and redis-accounting sections); update those examples
to use the exact flag and value format "-scheduler-managed <SHOW> ON" or "OFF"
so all docs match the working CLI syntax.

In `@pycue/opencue/wrappers/show.py`:
- Around line 223-235: setSchedulerManaged currently calls
stub.SetSchedulerManaged but doesn't update the wrapper's local state, so
subsequent schedulerManaged() reads can be stale; after calling
SetSchedulerManaged (in setSchedulerManaged), assign the returned value to
self.data.scheduler_managed (or set self.data.scheduler_managed = value if the
RPC returns no body) using the response from stub.SetSchedulerManaged to
synchronize the wrapper, ensuring schedulerManaged() reflects the new state.

In `@rust/crates/scheduler/src/accounting/dao.rs`:
- Around line 146-157: The query_managed_show_ids function currently maps DB
rows using parse_uuid which unwraps and will panic on malformed UUIDs; change
the handling to use a fallible parse instead: update parse_uuid (in
dao/helpers.rs) to return Result<Uuid, Report> or provide a new try_parse_uuid
that calls Uuid::parse_str and returns an Err on failure, then modify
query_managed_show_ids (and other callers) to collect a Result<Vec<Uuid>, _> or
filter/log and skip invalid rows rather than calling unwrap—propagate the error
with wrap_err for diagnostics or log and skip malformed rows so a single corrupt
DB value cannot crash the scheduler.

In `@rust/crates/scheduler/src/accounting/recompute.rs`:
- Around line 46-52: The current recompute task exits permanently if
ResourceAccountingDao::new().await returns Err, causing a long-lived outage;
change the initialization to retry instead of returning: wrap the call to
ResourceAccountingDao::new().await (the code creating pg_dao/Arc::new(d)) in a
retry loop with exponential backoff (or fixed delay) and error logging, continue
retrying until success, and only proceed to create pg_dao and enter the
recompute loop after a successful Result; ensure any temporary failures do not
cause an early return from the recompute task.

In `@rust/crates/scheduler/src/accounting/redis_client.rs`:
- Around line 142-144: Update the unit wording in the Redis client docs: change
"centicores" to "cores" in the doc comment that begins "Reads the subscription
hash's booked cores + burst in one round-trip..." and the other similar doc
comment around the subsequent block (the occurrences currently at the same doc
block and the one at lines ~161-162). Ensure both doc comments and any inline
mention in redis_client.rs consistently state that Redis counters operate in
cores (not centicores) so the Lua contract and callers are not misled.
- Line 155: The call currently uses conn.hget(&key, &["int_cores",
"burst"]).await? which is a single-field read API but is passing multiple
fields; change this to the multi-field API conn.hmget(&key, &["int_cores",
"burst"]).await? (keeping the expected type Vec<Option<i64>> and the surrounding
error handling) so the Redis client sends HMGET for the fields "int_cores" and
"burst" and maps into values as before.

In `@rust/crates/scheduler/src/cluster.rs`:
- Around line 418-423: The code can panic when current_index_atomic contains an
out-of-range value after self.clusters shrinks; before indexing clusters use the
current clusters length to clamp the index (and handle empty clusters).
Specifically, in the block around current_index_atomic.load, get let len =
clusters.len(); if len == 0 return/skip; compute a safe_index = current_index %
len (or safe_index = min(current_index, len-1)), then read item =
clusters[safe_index].clone(); compute next_index = (safe_index + 1) % len and
store that back into current_index_atomic; this prevents indexing with stale,
too-large indices while preserving wrap-around detection and forward progress.

In `@rust/crates/scheduler/src/models/core_size.rs`:
- Around line 68-71: The CoreSize conversion functions
(CoreSize::from_multiplied and CoreSize::from_multiplied_cap) perform unchecked
i64 -> i32 narrowing using `as i32`, which can silently wrap; change them to use
a checked conversion (e.g., try_into / i32::try_from) or explicit saturation
with bounds checking against i32::MIN/i32::MAX and handle failures by returning
a Result or clamping to safe limits. Update the functions that reference
CONFIG.queue.core_multiplier to compute the intermediate i64, check for
overflow/underflow before converting, and either return an error (propagate
Result) or clamp to i32 bounds and document the behavior so no silent wrapping
occurs.

In `@rust/crates/scheduler/src/pipeline/dispatcher/actor.rs`:
- Around line 482-491: BookingDelta is being built with DispatchLayer.folder_id
and dept_id that may be Uuid::nil(), which would produce incorrect
acct:folder:{folder_id} and acct:point:{dept_id}:{show_id} Redis keys; add an
explicit nil-UUID guard before constructing BookingDelta (best place: inside
dispatch_virtual_proc right before creating BookingDelta or at the callsite in
dispatch_inner/consume_host_virtual_resources) that checks
DispatchLayer.folder_id.is_nil() and DispatchLayer.dept_id.is_nil() and
rejects/returns an Err (or skips booking) with a logged error; ensure the check
references the DispatchLayer fields and prevents creating BookingDelta when
either UUID is nil so no acct:folder or acct:point keys are written for the zero
UUID.

---

Nitpick comments:
In `@pycue/tests/wrappers/test_show.py`:
- Around line 191-201: The test only asserts the RPC for enabling the scheduler;
update testSetSchedulerManaged to exercise both toggle directions by invoking
show.setSchedulerManaged(True) and show.setSchedulerManaged(False) (or call once
and then again) and assert the stubMock.SetSchedulerManaged was called for both
cases using either two assert_called_with checks referencing
show_pb2.ShowSetSchedulerManagedRequest(show=show.data, enabled=True) and
...enabled=False) or a single assert_has_calls with both expected requests;
target symbols: testSetSchedulerManaged, show.setSchedulerManaged,
stubMock.SetSchedulerManaged, and ShowSetSchedulerManagedRequest.

In `@rust/crates/scheduler/src/config/mod.rs`:
- Around line 101-105: The RedisConfig::url() method currently returns
"redis://host:port/" with no auth support; update RedisConfig (struct) to
include optional credentials (e.g., username: Option<String>, password:
Option<String>) or an optional raw_connection_string: Option<String>, and modify
RedisConfig::url() to: if raw_connection_string is Some return it; otherwise
build the URL including "redis://username:password@host:port/" when credentials
are present (omit username if only password needed per Redis URI rules) and
percent-encode credentials as necessary to avoid malformed URLs. Ensure you
reference the RedisConfig struct and the url() method so callers keep backwards
compatibility by keeping fields optional.

In `@rust/crates/scheduler/src/dao/resource_accounting_dao.rs`:
- Around line 149-152: Remove the redundant p.pk_show = ANY($1) predicate from
the LEFT JOIN on proc (alias p) in the point recompute SQL: the join already
enforces p.pk_show = pt2.pk_show and the WHERE clause pt2.pk_show = ANY($1)
filters pt2, so delete the p.pk_show = ANY($1) condition in the LEFT JOIN to
simplify the query; update the SQL string in resource_accounting_dao.rs where
the LEFT JOIN proc p ... AND p.pk_show = ANY($1) appears and ensure spacing/AND
chaining remains valid after removal.

In `@rust/crates/scheduler/src/pipeline/matcher.rs`:
- Around line 306-312: The code currently ignores the result of
feed_sender.send(FeedMessage::Sleep((*cluster).clone(),
CONFIG.queue.cluster_empty_sleep)).await; change this to handle the Result
explicitly: if send returns Ok proceed as before, but if Err (channel closed)
log the failure (include context like cluster and that FeedMessage::Sleep
failed) and apply a local backoff before returning (e.g., await
tokio::time::sleep(Duration::from_millis(CONFIG.queue.cluster_empty_sleep)) or
equivalent) so the over-burst path does not immediately return without backoff;
update matcher.rs around the feed_sender send call to match on the Result and
handle Err accordingly.

🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a7aa165-b9e9-4987-a82f-97ec0dd9c477

📥 Commits
Reviewing files that changed from the base of the PR and between e504bc9 and eac7ebf.

📒 Files selected for processing (82)

.gitignore
VERSION.in
cueadmin/cueadmin/common.py
cueadmin/tests/test_common.py
cuebot/build.gradle
cuebot/src/main/java/com/imageworks/spcue/ShowEntity.java
cuebot/src/main/java/com/imageworks/spcue/VirtualProc.java
cuebot/src/main/java/com/imageworks/spcue/config/AppConfig.java
cuebot/src/main/java/com/imageworks/spcue/dao/ShowDao.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatchQuery.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/DispatcherDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/ProcDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/ShowDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/dao/postgres/WhiteboardDaoJdbc.java
cuebot/src/main/java/com/imageworks/spcue/servant/ManageShow.java
cuebot/src/main/java/com/imageworks/spcue/service/AccountingRedisPublisher.java
cuebot/src/main/java/com/imageworks/spcue/service/LettuceAccountingRedisPublisher.java
cuebot/src/main/resources/conf/ddl/postgres/migrations/V44__scheduler_pending_query_indexes.sql
cuebot/src/main/resources/conf/ddl/postgres/migrations/V45__show_scheduler_managed.sql
cuebot/src/main/resources/conf/spring/applicationContext-accounting.xml
cuebot/src/main/resources/opencue.properties
cuebot/src/test/java/com/imageworks/spcue/config/TestAppConfig.java
cuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/DispatcherDaoTests.java
cuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/ProcDaoTests.java
cuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/ShowDaoTests.java
cuebot/src/test/java/com/imageworks/spcue/test/servant/ManageShowTests.java
cuebot/src/test/java/com/imageworks/spcue/test/service/LettuceAccountingRedisPublisherTests.java
docs/_docs/developer-guide/redis-accounting.md
docs/_docs/developer-guide/scheduler.md
docs/_docs/getting-started/deploying-scheduler.md
docs/news/2025-12-12-distributed-scheduler-release.md
proto/src/show.proto
pycue/opencue/wrappers/show.py
pycue/tests/wrappers/test_show.py
rust/config/scheduler.yaml
rust/crates/scheduler/Cargo.toml
rust/crates/scheduler/src/accounting/booking_delta.rs
rust/crates/scheduler/src/accounting/bootstrap.rs
rust/crates/scheduler/src/accounting/dao.rs
rust/crates/scheduler/src/accounting/error.rs
rust/crates/scheduler/src/accounting/limit_reseed.rs
rust/crates/scheduler/src/accounting/lua.rs
rust/crates/scheduler/src/accounting/managed_shows.rs
rust/crates/scheduler/src/accounting/mod.rs
rust/crates/scheduler/src/accounting/recompute.rs
rust/crates/scheduler/src/accounting/redis_client.rs
rust/crates/scheduler/src/cluster.rs
rust/crates/scheduler/src/cluster_key.rs
rust/crates/scheduler/src/config/mod.rs
rust/crates/scheduler/src/dao/cluster_dao.rs
rust/crates/scheduler/src/dao/frame_dao.rs
rust/crates/scheduler/src/dao/host_dao.rs
rust/crates/scheduler/src/dao/job_dao.rs
rust/crates/scheduler/src/dao/layer_dao.rs
rust/crates/scheduler/src/dao/mod.rs
rust/crates/scheduler/src/dao/proc_dao.rs
rust/crates/scheduler/src/dao/resource_accounting_dao.rs
rust/crates/scheduler/src/host_cache/actor.rs
rust/crates/scheduler/src/host_cache/cache.rs
rust/crates/scheduler/src/host_cache/messages.rs
rust/crates/scheduler/src/host_cache/mod.rs
rust/crates/scheduler/src/lib.rs
rust/crates/scheduler/src/main.rs
rust/crates/scheduler/src/metrics/mod.rs
rust/crates/scheduler/src/models/core_size.rs
rust/crates/scheduler/src/models/frame.rs
rust/crates/scheduler/src/models/layer.rs
rust/crates/scheduler/src/models/mod.rs
rust/crates/scheduler/src/models/subscription.rs
rust/crates/scheduler/src/models/virtual_proc.rs
rust/crates/scheduler/src/pipeline/dispatcher/actor.rs
rust/crates/scheduler/src/pipeline/dispatcher/error.rs
rust/crates/scheduler/src/pipeline/dispatcher/mod.rs
rust/crates/scheduler/src/pipeline/entrypoint.rs
rust/crates/scheduler/src/pipeline/layer_permit.rs
rust/crates/scheduler/src/pipeline/matcher.rs
rust/crates/scheduler/src/pipeline/mod.rs
rust/crates/scheduler/src/pipeline/placement.rs
rust/crates/scheduler/src/resource_accounting.rs
rust/crates/scheduler/tests/redis_integration.rs
rust/crates/scheduler/tests/smoke_tests.rs
rust/crates/scheduler/tests/util.rs

💤 Files with no reviewable changes (5)

This review was done before the branch has been properly rebased with master and contained many comments unrelated to the actual work.

Signed-off-by: Diego Tavares <dtavares@imageworks.com>

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

DiegoTavares force-pushed the sched_e-pvm branch from eac7ebf to 21fb54a Compare June 9, 2026 01:34

DiegoTavares changed the title ~~Add scheduling mode: E-PVM~~ [scheduler] Add scheduling mode: E-PVM Jun 9, 2026

DiegoTavares added 3 commits June 9, 2026 15:28

Minor fixes after first review pass

ed9ac44

Fix conflict left over mark

2557c62

DiegoTavares force-pushed the sched_e-pvm branch from 21fb54a to 2557c62 Compare June 9, 2026 22:32

DiegoTavares added 2 commits June 11, 2026 09:22

Merge branch 'master' into sched_e-pvm

15250c3

Signed-off-by: Diego Tavares <dtavares@imageworks.com>

Fix unit test broken by upstream sync

3147111

DiegoTavares marked this pull request as ready for review June 11, 2026 21:56

DiegoTavares requested review from lithorus and ramonfigueiredo as code owners June 11, 2026 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[scheduler] Add scheduling mode: E-PVM#2398

[scheduler] Add scheduling mode: E-PVM#2398
DiegoTavares wants to merge 5 commits into
AcademySoftwareFoundation:masterfrom
DiegoTavares:sched_e-pvm

DiegoTavares commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiegoTavares commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DiegoTavares commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiegoTavares commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DiegoTavares commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading