Skip to content

Separate building/scheduling from storage#15542

Open
lisanna-dettwyler wants to merge 1 commit intoNixOS:masterfrom
lisanna-dettwyler:build-store
Open

Separate building/scheduling from storage#15542
lisanna-dettwyler wants to merge 1 commit intoNixOS:masterfrom
lisanna-dettwyler:build-store

Conversation

@lisanna-dettwyler
Copy link
Copy Markdown
Contributor

Separate building/scheduling from storage

This is the major first step of #5025.

Motivation

Right now, there is a bit of conceptual tension between --store and
--builders:

  • With --store, it is very convenient to think that the store knows
    how to build. One specifies a store, and gets a different method of
    building (local, some sort of remote) and scheduling (the remote store
    can take an entire derivation graph, multiple jobs) accordingly.

  • With --builders one has a local scheduler. Stores either act as a
    passive "workbench" for building (the local case) or we give them a
    single job (a single ready-to-build derivation) at a time.

    For historical reasons, this doesn't even use the store interface,
    except on the "other side" of the build hook.

Issue #5025 is about using the store interface for --builders. In this
case, we want to invert the relationship between Store and Worker.
We have no use in this case for various Store methods creating a
Worker behind the scenes, because we already have our Worker:

  • LocalStores don't need any build method at all, we can just have our
    Worker directly use the local store, as it does with the default
    worker.store today.

  • remote stores supporting building (ssh:// and ssh-ng://) are only
    fed a single job at a time, their remote-side scheduling being
    overkill for the task at hand.

But we can't just delete the building methods of --store that we don't
need anymore, because that would break --store building. We need to
support both cases, where some stores effectively build/schedule and
Worker can also own/borrow stores to be a single, unified scheduler.

This change

The way we satisfy both goals is by:

  • Pulling the building methods out of Store into a new Builder class

  • Having some stores also give/implement Builder

The separation of Store vs Builder works for the --builders
use-case, and the project of making that leverage Store and other C++
interfaces directly without indirecting through build hook or other
ad-hoc implementation swapping methods. Here's how: as opposed to
default Store:: method implementations creating a Worker on the fly,
Worker will implement Builder, and those methods, now on Builder,
will become Worker's own implementation.

Local building

To implement this conceptual switch, the methods that directly delegated
to the worker are now instead ripped off Store and put in the new
Builder class. (For example, build/entry-points.cc now contains all
Worker methods (virtual method impls of Builder) and not Store
method impls.)

Worker also now owns ref<...> srcStore, destStore, directly out of
issue #5025. (Store & store, evalStore are kept as aliasing references
to reduce churn.)

(Worker should be renamed to LocalBuilder, since it is the local
build scheduler, and additionally knows how to build in local stores.)

Remote building

What about the --store case? The remote stores have a new method to
provide a Builder of their choice given an evalStore. (This reflects
the fact that Builder no longer has evalStore parameters on its
methods.) That new method is a new interface RemoteBuildStore which
RemoteStore and LegacySSHStore implement. Each one has an unexposed
Builder implementation which will just do everything over RPC, like
today.

Putting it all together

Introduce:

  • getLocalBuilder, a freestanding function to make a Worker.
    (Really, the details of Worker should be considered private to the
    build/*.cc files)

  • getDefaultBuilder, a freestanding function which will call
    RemoteBuildStore::getBuilder for RemoteBuildStores, and use
    getLocalBuilder otherwise.

Future work

Issue #1221

The next step of the #5025 saga is issue #1221. To solve that issue,
Worker will not use the build hook, but instead work via C++. In particular,
it will do this:

  • if the builder is a RemoteBuildStore, use an appropriate method (possibly
    yet to be created) on the remote building store's (remote) Builder.

  • if the builder is a LocalStore Worker should not create another
    Worker (as the build-remote program would do today) but instead directly
    manage building in that local store, so we avoid n Worker instances
    scheduling independently (which is stupid discoordination).

  • (Otherwise fail, which matches what happens today, actually, just in fewer
    steps.)

Simplifying the RPC case

I also suspect that longer term, those stores will just implement
Builder directly, as evalStore doesn't really make sense for RPC
endpoints when the remote side has no idea what the caller is doing with
other stores. We won't need RemoteBuildStore anymore then.

Other improvements

Recursive Nix

Speaking of avoiding redundant schedulers: RestrictedStore, when it
used to override the store building methods, would spin up a new
Worker for each recursive Nix build call. This is again bad --- we
should have a central scheduler that takes in dynamic jobs, same for
recursive Nix and dynamic derivations. Now this is almost, but not
quite, fixed. Three changes were made:

  • RestrictedBuilder was split out from RestrictedStore to wrap the
    build methods.
  • processConnection takes an optional Builder parameter, using it
    directly rather than spinning one up with getDefaultBuilder.
  • DerivationBuilder took a callback to process the connection for
    recursive Nix, so the caller could provide the processConnection
    call with the ambient worker in order to reuse it.

This would have solved the redundant scheduler problem very nicely!

This unfortunately deadlocked, so instead the caller explicitly creates
a fresh worker (as before, but not hidden beneath a gazillion
abstractions) with a TODO saying the deadlock should be fixed and this
should not be done.

LegacySSHStore fix

As a final note, the old LegacySSHStore did not override
buildPathsWithResults, which meant that when specifying an ssh://
store, the local scheduler was being erroneously used for some commands.
Now, LegacySSHBuilder::buildPathsWithResults uses a single
buildPathsRaw call (which sends the serve protocol BuildPaths
command and returns std::variant<BuildResultSuccessStatus, BuildError>
with the error message already read from the wire), and then queries
realisations to reconstruct the BuildResults --- code similar to the
old fallback code for ssh-ng://.


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions Bot added new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store repl The Read Eval Print Loop, "nix repl" command and debugger fetching Networking with the outside (non-Nix) world, input locking c api Nix as a C library with a stable interface labels Mar 23, 2026
Comment thread src/libstore/daemon.cc Outdated
Comment thread src/libstore/daemon.cc Outdated
Comment thread src/libstore/include/nix/store/build-store.hh
Comment thread src/libstore/include/nix/store/build.hh
This is the major first step of NixOS#5025.

Motivation
----------

Right now, there is a bit of conceptual tension between `--store` and
`--builders`:

- With `--store`, it is very convenient to think that the store knows
  how to build. One specifies a store, and gets a different method of
  building (local, some sort of remote) and scheduling (the remote store
  can take an entire derivation graph, multiple jobs) accordingly.

- With `--builders` one has a local scheduler. Stores either act as a
  passive "workbench" for building (the local case) or we give them a
  single job (a single ready-to-build derivation) at a time.

  For historical reasons, this doesn't even use the store interface,
  except on the "other side" of the build hook.

Issue NixOS#5025 is about using the store interface for `--builders`. In this
case, we want to invert the relationship between `Store` and `Worker`.
We have no use in this case for various `Store` methods creating a
`Worker` behind the scenes, because we already have our `Worker`:

- `LocalStore`s don't need any build method at all, we can just have our
  `Worker` directly use the local store, as it does with the default
  `worker.store` today.

- remote stores supporting building (`ssh://` and `ssh-ng://`) are only
  fed a single job at a time, their remote-side scheduling being
  overkill for the task at hand.

But we can't just delete the building methods of `--store` that we don't
need anymore, because that would break `--store` building. We need to
support both cases, where some stores effectively build/schedule and
`Worker` can also own/borrow stores to be a single, unified scheduler.

This change
-----------

The way we satisfy both goals is by:

- Pulling the building methods out of `Store` into a new `Builder` class

- Having some stores also give/implement `Builder`

The separation of `Store` vs `Builder` works for the `--builders`
use-case, and the project of making that leverage `Store` and other C++
interfaces directly without indirecting through build hook or other
ad-hoc implementation swapping methods. Here's how: as opposed to
default `Store::` method implementations creating a `Worker` on the fly,
`Worker` will implement `Builder`, and those methods, now on `Builder`,
will become `Worker`'s own implementation.

Local building

To implement this conceptual switch, the methods that directly delegated
to the worker are now instead ripped off `Store` and put in the new
`Builder` class. (For example, `build/entry-points.cc` now contains all
`Worker` methods (virtual method impls of `Builder`) and not `Store`
method impls.)

`Worker` also now owns `ref<...> srcStore, destStore`, directly out of
issue NixOS#5025. (`Store & store, evalStore` are kept as aliasing references
to reduce churn.)

(`Worker` should be renamed to `LocalBuilder`, since it is the local
build scheduler, and additionally knows how to build in local stores.)

Remote building

What about the `--store` case? The remote stores have a new method to
provide a `Builder` of their choice given an `evalStore`. (This reflects
the fact that `Builder` no longer has `evalStore` parameters on its
methods.) That new method is a new interface `BuildStore` which
`RemoteStore` and `LegacySSHStore` implement. Each one has an unexposed
`Builder` implementation which will just do everything over RPC, like
today.

Putting it all together

Introduce:

- `getLocalBuilder`, a freestanding function to make a `Worker`.
  (Really, the details of `Worker` should be considered private to the
  `build/*.cc` files)

- `getDefaultBuilder`, a freestanding function which will call
  `BuildStore::getBuilder` for `BuildStore`s, and use
  `getLocalBuilder` otherwise.

Future work
-----------

Issue NixOS#1221

The next step of the NixOS#5025 saga is issue NixOS#1221. To solve that issue,
`Worker` will not use the build hook, but instead work via C++. In particular,
it will do this:

- if the builder is a `BuildStore`, use an appropriate method (possibly
  yet to be created) on the remote building store's (remote) `Builder`.

- if the builder is a `LocalStore` `Worker` should *not* create another
  `Worker` (as the `build-remote` program would do today) but instead directly
  manage building in that local store, so we avoid **n** `Worker` instances
  scheduling independently (which is stupid discoordination).

- (Otherwise fail, which matches what happens today, actually, just in fewer
  steps.)

Simplifying the RPC case

I also suspect that longer term, those stores will just implement
`Builder` directly, as `evalStore` doesn't really make sense for RPC
endpoints when the remote side has no idea what the caller is doing with
other stores. We won't need `BuildStore` anymore then.

Other improvements
------------------

Recursive Nix

Speaking of avoiding redundant schedulers: `RestrictedStore`, when it
used to override the store building methods, would spin up a new
`Worker` for each recursive Nix build call. This is again bad --- we
should have a central scheduler that takes in dynamic jobs, same for
recursive Nix and dynamic derivations. Now this is *almost*, but not
quite, fixed. Three changes were made:

- `RestrictedBuilder` was split out from `RestrictedStore` to wrap the
  build methods.
- `processConnection` takes an optional `Builder` parameter, using it
  directly rather than spinning one up with `getDefaultBuilder`.
- `DerivationBuilder` took a callback to process the connection for
  recursive Nix, so the caller could provide the `processConnection`
  call with the ambient worker in order to reuse it.

This would have solved the redundant scheduler problem very nicely!

This unfortunately deadlocked, so instead the caller explicitly creates
a fresh worker (as before, but not hidden beneath a gazillion
abstractions) with a TODO saying the deadlock should be fixed and this
should not be done.

`LegacySSHStore` fix

As a final note, the old `LegacySSHStore` did not override
`buildPathsWithResults`, which meant that when specifying an `ssh://`
store, the local scheduler was being erroneously used for some commands.
Now, `LegacySSHBuilder::buildPathsWithResults` uses a single
`buildPathsRaw` call (which sends the serve protocol `BuildPaths`
command and returns `std::variant<BuildResultSuccessStatus, BuildError>`
with the error message already read from the wire), and then queries
realisations to reconstruct the `BuildResult`s --- code similar to the
old fallback code for `ssh-ng://`.

Use std::shared_ptr for processConnection's builder

This avoids the need to pass a raw pointer.

Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>

Rename BuildStore to BuildStore

Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c api Nix as a C library with a stable interface fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command repl The Read Eval Print Loop, "nix repl" command and debugger store Issues and pull requests concerning the Nix store with-tests Issues related to testing. PRs with tests have some priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants