Skip to content

Add the legacy runtime backend#130

Merged
AlexJones0 merged 4 commits intolowRISC:masterfrom
AlexJones0:legacy_runtime_backend
Apr 1, 2026
Merged

Add the legacy runtime backend#130
AlexJones0 merged 4 commits intolowRISC:masterfrom
AlexJones0:legacy_runtime_backend

Conversation

@AlexJones0
Copy link
Copy Markdown
Contributor

@AlexJones0 AlexJones0 commented Mar 30, 2026

This PR is the eleventh of a series of PRs to rewrite DVSim's core scheduling functionality (Scheduler, status display, launchers / runtime backends) to use an async design, with key goals of long term maintainability and extensibility.

This PR contains the first introduction of the new async scheduler / Launcher code. In particular, it introduces the notion of a RuntimeBackend, which is the new async replacement for the legacy Launcher class. It then introduces a LegacyLauncherAdapter runtime backend, which is an async runtime backend that wraps the old legacy launcher classes, allowing all existing Launcher implementations to still be used with the new async scheduler without any need to port them. The intention is that these can then be slowly replaced over time if they are needed and used (Strangler Fig pattern).

Some key things to note about the RuntimeBackend interface:

  • Async interfaces are exposed to allow parallelised dispatch and I/O handling via co-operative multitasking.
  • A completion callback is registered for the runtime backends to tell the scheduler that they have completed - there is no need for explicit continuous polling and time management in the scheduler itself.
  • Interfaces are exposed for batch submission and completion, to allow execution environments that can amortize their overheads to take advantage of this optimization.

See the commit messages for more information.

@AlexJones0 AlexJones0 changed the title Legacy runtime backend Add the legacy runtime backend Mar 30, 2026
@AlexJones0 AlexJones0 force-pushed the legacy_runtime_backend branch from 807ebaf to 8da3d78 Compare March 30, 2026 12:59
Copy link
Copy Markdown
Collaborator

@machshev machshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, just a few nits.

@AlexJones0 AlexJones0 force-pushed the legacy_runtime_backend branch from 8da3d78 to 7e1dd19 Compare April 1, 2026 11:58
@AlexJones0 AlexJones0 marked this pull request as ready for review April 1, 2026 11:58
This is intended to mirror and eventually replace the `ErrorMessage`
model in `src/dvsim/launcher/base.py`. Notably, it extends it to include
the ability to provide no context (`None`) rather than requiring an
empty list, and it allows multiple ranges of lines to be provided (in
the form (start, end)) rather than just a single line number.

The intention is that:
 - This extended functionality can be used to provide richer error
   context where possible in the future.
 - This `JobStatusInfo` will be used by the new async scheduler and
   async backends, with the `CompletedJobStatus` eventually returning
   this as its `fail_msg`. It supersedes `ErrorMessage`, and removes the
   dependency of `job/data` on `launcher/base`.
 - While we could change all the `ErrorMessage`s to equivalent
   `JobStatusInfo` objects now, we instead retain the old type to reduce
   code churn for launchers that will be rewritten.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This is intended to uniquely disambiguate a job. It is exposed as a
property so it can be easily changed throughout all the new async
scheduler and runtime backend internals in a single place if it needs to
change in the future (e.g. if the `full_name` no longer becomes enough
to uniquely disambiguate a job, and a tuple of more information is
needed).

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit introduces the _public interface_ of the new
`RuntimeBackend` abstract base class, and generally introduces the
notion of "Runtime Backends" as a whole.

Runtime backends are intended to be replacements for the existing
concepts of Launchers, where runtime backends expose async interfaces
for asynchronously submitting and killing jobs, and being notified of
completion. The new name is chosen because, whilst the Launchers do
indeed "Launch" jobs, they also maintain, poll and kill them, which is
not really represented in the old name.

Submitting jobs to backends returns a handle to the job, which the async
scheduler can then use to interact with the different runtime backends.

Importantly, methods are provided for batch submitting and batch killing
many jobs at once - this is because many launcher backends could offer
optimized interfaces for batch actions by amortizing overheads.

A single protected method `_emit_completion` is also introduced in this
commit to show how the completion callback is intended to be used. No
other launcher logic related to the output directories, status checks,
log files, job environment, and job callbacks is included at this stage.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This acts as an adapter interface for the legacy non-async Launcher
classes so that they can be used with the upcoming new async Scheduler
and RuntimeBackend interfaces without needing to completely rewrite
them.

This is important because there are many existing launchers which are at
present difficult to test or validate - we can minimise breakage to
downstream dependencies by following the Strangler Fig pattern,
incrementally replacing the legacy launcher functionality, launcher by
launcher, replacing them with new runtime backend implementations.

The legacy launcher tries to respect all features originally implemented
by the current synchronous scheduler - poll frequency, max poll limits,
parallelism limits, error messages, etc. This is all managed by an
asynchronous polling task (`_poller`) which polls jobs in a FIFO poll
queue at set intervals. Submitting and killing jobs then just requires
launching and killing the underlying launchers and modifying the poll
queue accordingly.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Copy link
Copy Markdown
Collaborator

@machshev machshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thanks @AlexJones0

@AlexJones0 AlexJones0 added this pull request to the merge queue Apr 1, 2026
Merged via the queue into lowRISC:master with commit 65df3d6 Apr 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants