Open
Conversation
This commit fleshes out the abstract `RuntimeBackend` base class with a lot of core functionality that will be needed to implement new runtime backends (which aren't just the legacy launcher adapter). These mostly take the form of protected methods optionally called by backends, comprised of logic on the `Launcher` base class or that was previously duplicated across its various subclasses. Some key changes to note from the launchers: - A new `DVSIM_RUN_INTERACTIVE` env var is introduced intended to replace the `RUN_INTERACTIVE` env var long term, to avoid potential name collision. - Errors are raised if an interactive job tries to run on a backend that doesn't support running jobs interactively. - Log parsing functionality is extracted to a separate object; logs are always lazily loaded so that for jobs that don't need them (passing jobs without any fail or pass patterns), we don't waste time. - Efficiency of the log contents pass/fail regex pattern parsing is improved. Fail patterns are combined into a single regex check, and all regexes are compiled once instead of per-line. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This is the async `RuntimeBackend` replacement of the `LocalLauncher`, which will eventually by removed in lieu of this new backend. Some behavioural differences to note: - We now try to await() after a SIGKILL to be sure the process ended, bounded by a short timeout in case blocked at the kernel level. - We now use psutil to enumerate and kill descendent processes in addition to the created subprocess. This won't catch orphaned processes (needs e.g. cgroups), but should cover most sane usage. - The backend does _not_ link the output directories based on status (the `JobSpec.links`, e.g. "passing/", "failed/", "killed/"). The intention is that this detail is not core functionality for either the scheduler or the backends - instead, it will be implemented as an observer on the new async scheduler callbacks when introduced. By using async subprocesses and launching/killing jobs in batch, we are able to more efficiently launch jobs in parallel via async coroutines. We likewise avoid the ned to poll jobs - instead we have an async task awaiting the subprocess' completion, which we then forward to notify the (to be added) scheduler of the job's completion. Note that interactive jobs are still basically handled synchronously as before - assumed that there is only 1 interactive job running at a time. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
a155626 to
d14abc6
Compare
machshev
approved these changes
Apr 1, 2026
Collaborator
machshev
left a comment
There was a problem hiding this comment.
Nice work! @AlexJones0
Huge improvement...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is the tweflth of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.
This PR contains the local implementation of the
RuntimeBackendinterface, intended to replace theLocalLauncher. This launcher executes jobs as subprocesses on the user's local machine. The baseRuntimeBackendis also expanded with more backend-agnostic code that will be shared between most backends that are implemented and used.See the commit messages for more information.