Skip to content

Convert scheduler tests to use the new scheduler#136

Draft
AlexJones0 wants to merge 7 commits intolowRISC:masterfrom
AlexJones0:async_scheduler_tests
Draft

Convert scheduler tests to use the new scheduler#136
AlexJones0 wants to merge 7 commits intolowRISC:masterfrom
AlexJones0:async_scheduler_tests

Conversation

@AlexJones0
Copy link
Copy Markdown
Contributor

@AlexJones0 AlexJones0 commented Apr 1, 2026

Note: this PR is currently a draft as it depends on #135 which has not yet been merged; the first 3 commits are from that PR and can be safely ignored, only the last 4 commits are relevant. It is otherwise ready to review.

This PR is the fourteenth of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.

This PR converts the scheduler tests to use the new async scheduler so that we can test the new design. As CI shows, the scheduler now passes all of the existing scheduler tests (plus, a couple of new tests, added in this PR). The intention is not to merge this PR yet - it should probably wait until all the other scheduler integration is completed and merged, and should only be merged right before making the switch between the old and new scheduler. Instead, this PR is intended to show the correctness and functionality of the new scheduler via the existing tests.

See the commit messages for more information.

See the explanatory comments added to JobStatus. The intention is that
the new async scheduler will distinguish between jobs that are blocked
due to unfinished dependencies (`SCHEDULED`), and those that are pending
because there is no availability to run them, despite their dependencies
being fulfilled (`QUEUED`). This new state is currently unused.

Also add a short test to prevent potential future bugs from status
shorthand name collisions.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This field will be used to inform the new scheduler of which backend it
should use to execute a job. Though the plumbing is not there in the
rest of DVSim, the intent is to make the scheduler such that it could
feasibly be run with multiple backends (e.g. some jobs faked, some jobs
on the local machine, some dispatched to various remote clusters).

To support this design, each job spec can now specify that it should be
run on a certain backend, with some designated string name. To instead
just use the configured default backend (which is the current behaviour,
as the current scheduler only supports one backend / `launcher_cls`),
this can be set to `None`.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
For now, this is separated in `async_core.py` - the intention is that
it will eventually replace the scheduler in `core.py` when all
necessary components for it to work are integrated.

This commit contains the fully async scheduler design. Some notes:
- Everything is now async. The scheduler is no longer tied to a Timer
  object, nor does it have to manage its print interval and poll
  frequency. It takes advantage of parallelism via cooperative
  multitasking as much as possible.
- The scheduler is designed to support multiple different backends (new
  async versions of launchers). Jobs are dispatch according to their
  specifications and scheduler parameters.
- The scheduler implements the Observer pattern for various events
  (start, end, job status change, kill signal), allowing consumers that
  want to use this functionality (e.g. instrumentation, status printer)
  to hook into the scheduler, instead of unnecessarily coupling code.
- The previous scheduler only recognized killed jobs when they were
  reached in the queue and their status was updated. The new design
  immediately transitively updates jobs to instantly reflect status
  updates of all jobs when information is known.
- Since the scheduler knows _why_ it is killing the jobs, we attach
  JobStatusInfo information to give more info in the failure buckets.
- The job DAG is indexed and validated during initialization;
  dependency cycles are detected and cause an error to be raised.
- Job info is encapsulated by records, keeping state centralized
  (outside of indexes).
- The scheduler now accepts a prioritization function. It schedules
  jobs in a heap and schedules according to highest priority. Default
  prioritisation is by weights, but this can be customized.
- The scheduler now has its own separate modifiable parallelism limit.
- The scheduler has it sown separate modifiable parallelism limit
  separate from each individual backend's parallelism limit.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
The new scheduler uses an async model, so it's helpful for testing to
pull in the asyncio pytest plugin.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
@AlexJones0 AlexJones0 force-pushed the async_scheduler_tests branch from a5e10db to 93d1b87 Compare April 1, 2026 16:18
Copy link
Copy Markdown
Collaborator

@machshev machshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @AlexJones0

@AlexJones0 AlexJones0 force-pushed the async_scheduler_tests branch from 93d1b87 to 1a9686d Compare April 1, 2026 18:08
To be sure Nix users can pull in the new Python dependency.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit performs the changes necessary to port the scheduler tests
to use the new async scheduler. This involves:
- Creating a Mock RuntimeBackend. For now, to keep changes minimal and
  simple, we just use the LauncherAdapter with the MockLauncher. In the
  future it would be nice to make a mock RuntimeBackend as well though.
- Mark all the tests as being asyncio with async def and use the new
  scheduler interface.
- Update a couple of tests that were weirdly constructed (e.g. in terms
  of targets/ordering) due to constraints of the old scheduler.

With these changes, _all_ scheduler tests are now passing with the new
async scheduler across multiple iterations.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Extend an existing test case for launcher / runtime backend parallelism
to be able to also consider global scheduler-level parallelism.

Introduce a new test to check that we can provide a custom
prioritization function, and that jobs are indeed scheduled according to
the priorities assigned by it if not blocked.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
@AlexJones0 AlexJones0 force-pushed the async_scheduler_tests branch from 1a9686d to ccf3abe Compare April 1, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants