Convert scheduler tests to use the new scheduler#136
Draft
AlexJones0 wants to merge 7 commits intolowRISC:masterfrom
Draft
Convert scheduler tests to use the new scheduler#136AlexJones0 wants to merge 7 commits intolowRISC:masterfrom
AlexJones0 wants to merge 7 commits intolowRISC:masterfrom
Conversation
See the explanatory comments added to JobStatus. The intention is that the new async scheduler will distinguish between jobs that are blocked due to unfinished dependencies (`SCHEDULED`), and those that are pending because there is no availability to run them, despite their dependencies being fulfilled (`QUEUED`). This new state is currently unused. Also add a short test to prevent potential future bugs from status shorthand name collisions. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This field will be used to inform the new scheduler of which backend it should use to execute a job. Though the plumbing is not there in the rest of DVSim, the intent is to make the scheduler such that it could feasibly be run with multiple backends (e.g. some jobs faked, some jobs on the local machine, some dispatched to various remote clusters). To support this design, each job spec can now specify that it should be run on a certain backend, with some designated string name. To instead just use the configured default backend (which is the current behaviour, as the current scheduler only supports one backend / `launcher_cls`), this can be set to `None`. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
For now, this is separated in `async_core.py` - the intention is that it will eventually replace the scheduler in `core.py` when all necessary components for it to work are integrated. This commit contains the fully async scheduler design. Some notes: - Everything is now async. The scheduler is no longer tied to a Timer object, nor does it have to manage its print interval and poll frequency. It takes advantage of parallelism via cooperative multitasking as much as possible. - The scheduler is designed to support multiple different backends (new async versions of launchers). Jobs are dispatch according to their specifications and scheduler parameters. - The scheduler implements the Observer pattern for various events (start, end, job status change, kill signal), allowing consumers that want to use this functionality (e.g. instrumentation, status printer) to hook into the scheduler, instead of unnecessarily coupling code. - The previous scheduler only recognized killed jobs when they were reached in the queue and their status was updated. The new design immediately transitively updates jobs to instantly reflect status updates of all jobs when information is known. - Since the scheduler knows _why_ it is killing the jobs, we attach JobStatusInfo information to give more info in the failure buckets. - The job DAG is indexed and validated during initialization; dependency cycles are detected and cause an error to be raised. - Job info is encapsulated by records, keeping state centralized (outside of indexes). - The scheduler now accepts a prioritization function. It schedules jobs in a heap and schedules according to highest priority. Default prioritisation is by weights, but this can be customized. - The scheduler now has its own separate modifiable parallelism limit. - The scheduler has it sown separate modifiable parallelism limit separate from each individual backend's parallelism limit. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
The new scheduler uses an async model, so it's helpful for testing to pull in the asyncio pytest plugin. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
a5e10db to
93d1b87
Compare
machshev
approved these changes
Apr 1, 2026
Collaborator
machshev
left a comment
There was a problem hiding this comment.
LGTM! Thanks @AlexJones0
93d1b87 to
1a9686d
Compare
To be sure Nix users can pull in the new Python dependency. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit performs the changes necessary to port the scheduler tests to use the new async scheduler. This involves: - Creating a Mock RuntimeBackend. For now, to keep changes minimal and simple, we just use the LauncherAdapter with the MockLauncher. In the future it would be nice to make a mock RuntimeBackend as well though. - Mark all the tests as being asyncio with async def and use the new scheduler interface. - Update a couple of tests that were weirdly constructed (e.g. in terms of targets/ordering) due to constraints of the old scheduler. With these changes, _all_ scheduler tests are now passing with the new async scheduler across multiple iterations. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Extend an existing test case for launcher / runtime backend parallelism to be able to also consider global scheduler-level parallelism. Introduce a new test to check that we can provide a custom prioritization function, and that jobs are indeed scheduled according to the priorities assigned by it if not blocked. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
1a9686d to
ccf3abe
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: this PR is currently a draft as it depends on #135 which has not yet been merged; the first 3 commits are from that PR and can be safely ignored, only the last 4 commits are relevant. It is otherwise ready to review.
This PR is the fourteenth of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.
This PR converts the scheduler tests to use the new async scheduler so that we can test the new design. As CI shows, the scheduler now passes all of the existing scheduler tests (plus, a couple of new tests, added in this PR). The intention is not to merge this PR yet - it should probably wait until all the other scheduler integration is completed and merged, and should only be merged right before making the switch between the old and new scheduler. Instead, this PR is intended to show the correctness and functionality of the new scheduler via the existing tests.
See the commit messages for more information.