Add new async scheduler by AlexJones0 · Pull Request #135 · lowRISC/dvsim

AlexJones0 · 2026-04-01T14:50:16Z

This PR is the thirteenth of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.

Edit: I've also just opened #136. This PR should be merged later down the line (after more integration PRs), but is intended for now to show that the new scheduler is passing the existing test suite.

This is probably the largest PR, and contains the entire new async scheuduler implementation (i.e. the scheduler rewrite). Note that the code to integrate this new async scheduler is not yet included - I thought about including it to make the async scheduler usable in this PR, but decided that it would probably be a bit too much to review at once, so I'm deferring it to a future PR. Note also that the new scheduler is being merged in stages - the old scheduler, launchers etc. will not be removed until the official transition to use the new scheduler.

Some of the main features/differences to note about the new scheduler:

Everything is now async. The scheduler is no longer tied to a Timer object, nor does it have to manage its print interval and poll frequency. It takes advantage of parallelism via cooperative multitasking as much as possible.
The scheduler is designed to support multiple different backends (new async versions of launchers). Jobs are dispatched according to their specifications and scheduler parameters.
The scheduler implements the Observer pattern for various events, allowing consumers that want to use this functionality (e.g. instrumentation, status printer) to hook into the scheduler, instead of unnecessarily coupling code. These consumers are not included in this PR (will come later).
The previous scheduler only recognized killed jobs when they were reached in the queue and their status was updated. The new design immediately transitively updates jobs to instantly reflect status updates to all jobs when possible. Also, since the scheduler knows why it is killing jobs, we add the reason to give more failure bucket info.
The job DAG is indexed and validated during initialization; dependency cycles are detected and cause a raised error.
The scheduler now accepts a prioritization function. It schedules jobs in a heap and schedules according to highest priority. Default prioritization is by weights, but this can be customized.
The scheduler now has its own separate modifiable parallelism limit.

I recommend to see the commit messages for more information.

See the explanatory comments added to JobStatus. The intention is that the new async scheduler will distinguish between jobs that are blocked due to unfinished dependencies (`SCHEDULED`), and those that are pending because there is no availability to run them, despite their dependencies being fulfilled (`QUEUED`). This new state is currently unused. Also add a short test to prevent potential future bugs from status shorthand name collisions. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

This field will be used to inform the new scheduler of which backend it should use to execute a job. Though the plumbing is not there in the rest of DVSim, the intent is to make the scheduler such that it could feasibly be run with multiple backends (e.g. some jobs faked, some jobs on the local machine, some dispatched to various remote clusters). To support this design, each job spec can now specify that it should be run on a certain backend, with some designated string name. To instead just use the configured default backend (which is the current behaviour, as the current scheduler only supports one backend / `launcher_cls`), this can be set to `None`. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

For now, this is separated in `async_core.py` - the intention is that it will eventually replace the scheduler in `core.py` when all necessary components for it to work are integrated. This commit contains the fully async scheduler design. Some notes: - Everything is now async. The scheduler is no longer tied to a Timer object, nor does it have to manage its print interval and poll frequency. It takes advantage of parallelism via cooperative multitasking as much as possible. - The scheduler is designed to support multiple different backends (new async versions of launchers). Jobs are dispatched according to their specifications and scheduler parameters. - The scheduler implements the Observer pattern for various events (start, end, job status change, kill signal), allowing consumers that want to use this functionality (e.g. instrumentation, status printer) to hook into the scheduler, instead of unnecessarily coupling code. - The previous scheduler only recognized killed jobs when they were reached in the queue and their status was updated. The new design immediately transitively updates jobs to instantly reflect status updates of all jobs when information is known. - Since the scheduler knows _why_ it is killing the jobs, we attach JobStatusInfo information to give more info in the failure buckets. - The job DAG is indexed and validated during initialization; dependency cycles are detected and cause an error to be raised. - Job info is encapsulated by records, keeping state centralized (outside of indexes). - The scheduler now accepts a prioritization function. It schedules jobs in a heap and schedules according to highest priority. Default prioritization is by weights, but this can be customized. - The scheduler now has its own separate modifiable parallelism limit. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

machshev

Nice work @AlexJones0!
Just a small nit.

machshev · 2026-04-01T17:23:17Z

src/dvsim/scheduler/async_core.py

+        self.backends = dict(backends)
+        self.default_backend = default_backend
+        self.max_parallelism = max_parallelism
+        self.priority_fn = priority_fn or self._default_priority
+        self.coalesce_window = coalesce_window


Do we need public attributes? This implies they can be mutated externally while the scheduler is running?

machshev · 2026-04-01T17:26:48Z

src/dvsim/scheduler/async_core.py

+        """Prioritizes jobs according to their weight. The default prioritization method."""
+        return job.spec.weight
+
+    def _build_graph(self, specs: Iterable[JobSpec]) -> None:


For later, but it might be nice to be able to build the graph as a debug thing. Sort of a dry run mode, but more for checking the scheduler... maybe. Just a thought, we can add it in later if needed.

AlexJones0 mentioned this pull request Apr 1, 2026

Convert scheduler tests to use the new scheduler #136

Draft

AlexJones0 added 3 commits April 1, 2026 17:09

AlexJones0 force-pushed the new_async_scheduler branch from 1adbdb1 to 1dd6d12 Compare April 1, 2026 16:10

AlexJones0 marked this pull request as ready for review April 1, 2026 16:17

AlexJones0 requested review from machshev and rswarbrick April 1, 2026 16:17

machshev approved these changes Apr 1, 2026

View reviewed changes

This was referenced Apr 1, 2026

Port the status printers to use the new async scheduler #137

Draft

Integrate the new async scheduler with the base FlowCfg #138

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new async scheduler#135

Add new async scheduler#135
AlexJones0 wants to merge 3 commits intolowRISC:masterfrom
AlexJones0:new_async_scheduler

AlexJones0 commented Apr 1, 2026 •

edited

Loading

Uh oh!

machshev left a comment

Uh oh!

machshev Apr 1, 2026

Uh oh!

machshev Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexJones0 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

machshev left a comment

Choose a reason for hiding this comment

Uh oh!

machshev Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

machshev Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexJones0 commented Apr 1, 2026 •

edited

Loading