Skip to content

lib: add krun_set_init_binary, backcompat via new krun_init crate#593

Open
ggoodman wants to merge 1 commit intocontainers:mainfrom
ggoodman:feat/init-api
Open

lib: add krun_set_init_binary, backcompat via new krun_init crate#593
ggoodman wants to merge 1 commit intocontainers:mainfrom
ggoodman:feat/init-api

Conversation

@ggoodman
Copy link
Copy Markdown
Contributor

Intent

Our goal with this PR is to prepare for a future Rust API where the main crate does not always embed or compile an init binary as part of being usable.

Today, init.krun is effectively a hard-wired implementation detail of the lower layers (especially devices), which makes sense for the current C-oriented flow, but creates friction for a slim Rust-first API design. We want to preserve the existing behavior for C consumers while moving toward a model where init payload is an explicit input, not an unconditional side effect of depending on core crates.

In short: keep the current default UX where needed, but stop forcing the init payload into every build shape.

Approach

This PR splits policy from mechanism and introduces an explicit init contract.

First, it moves default init embedding/build responsibilities into a dedicated sibling crate (krun_init) instead of keeping that logic in devices. That lets us preserve current behavior (default payload is still available) without making devices itself always own the init artifact.

Second, it changes virtio-fs passthrough to accept init payload through configuration (Option<Arc<[u8]>>) rather than via a crate-level static include. The synthetic /init.krun inode is now exposed only when payload is present, which makes the behavior explicit and composable.

Third, it replaces the stringly cmdline check with a typed boot contract. We now express the requirement as InitPolicy::InitKrunFromVirtioFs, and enforce it centrally in vmm::builder. This keeps validation robust and avoids heuristic coupling to command-line text parsing.

Fourth, on the C side, it keeps backward-compatible defaults by auto-wiring the default init payload, and additionally introduces a runtime override API (krun_set_init_binary) so callers can inject their own init bytes for a context.

Trade-offs

The main benefit is architectural clarity: core crates no longer implicitly force init embedding semantics, and the contract around /init.krun is explicit and validated in one place. This also gives us a clean migration path toward a Rust API where init choice is deliberate.

The cost is extra plumbing and a slightly larger configuration surface (init_policy, payload propagation through fs config). We also introduce another crate (krun_init) in the dependency graph, which is intentional but still additional structure to maintain.

Overall, this is a deliberate trade: modest complexity increase now in exchange for a cleaner long-term API boundary and less hidden policy in shared internals.

This was referenced Mar 17, 2026
@ggoodman ggoodman marked this pull request as draft March 17, 2026 19:44
* Returns:
* Zero on success or a negative error number on failure.
*/
int32_t krun_set_init_binary(uint32_t ctx_id, const uint8_t *init_binary, size_t init_binary_len);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

krun_set_init would be a bit more succinct.

Suggested change
int32_t krun_set_init_binary(uint32_t ctx_id, const uint8_t *init_binary, size_t init_binary_len);
int32_t krun_set_init(uint32_t ctx_id, const uint8_t *bin, size_t size);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

/// Table of exported FDs to share with other subsystems. Not supported for macos.
pub export_table: Option<ExportTable>,
pub allow_root_dir_delete: bool,
pub init_payload: Option<Arc<[u8]>>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be an Arc<[u8]> rather than Option<Arc<[u8]>>, in which if there is no init binary set, we fall back to the default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that result in any binary built off this now forced to include the bytes of the default init? My intention here is to avoid that by putting it in another compilation unit and forcing the user to provide it.

But to maintain parity with the c API, I did the default fallback.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I guess there is no way to "fall back" without including the bytes within the compilation.

But are we requiring that an init binary always be set now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are that is not the intention. My intention was as follows:

  1. For the current c api, always fall back to the current init.
  2. For the speculative rust api, require that an init binary be configured. Document that the built-in init binary is in the libkrun_init crate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1251 to +1253
if off >= init_payload.len() {
return Ok(0);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why zero is returned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Read past EOF should return 0 bytes read.

Comment on lines +1475 to +1477
if off >= init_payload.len() {
return Ok(0);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, why return zero?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Read past EOF should return 0 bytes read.

@ggoodman ggoodman force-pushed the feat/init-api branch 4 times, most recently from 1bdb760 to f5b0594 Compare March 18, 2026 02:28
@ggoodman ggoodman marked this pull request as ready for review March 18, 2026 02:33
return -libc::EINVAL;
}

let payload = Arc::<[u8]>::from(slice::from_raw_parts(init_binary, init_binary_len));
Copy link
Copy Markdown
Contributor

@d-e-s-o d-e-s-o Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a bit uneasy about this potentially sizable copy (and heap allocation; for something that with great likelihood is already statically included in the binary anyway). It strikes me as something we should try and avoid, if at all possible. Have you considered replacing the Arc<[u8]> construct with a &'static [u8]? Would it impede usability significantly?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I would prefer the &'static [u8] too.

'static is currently the correct lifetime, because we exit the process once VM quits, but this will probably change in the future and this may become more unsafe. For a C API like we have now, having a huge static buffer be caller owned is reasonable behavior IMO. (I wonder though if it will be possible to design the Rust API so it's possible to name a 'vmm lifetime...?)

Copy link
Copy Markdown
Contributor Author

@ggoodman ggoodman Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I have tried addressing this in fcd6927 without trying to pretend to own the memory range supplied to the c api by using an internal enum of owned vs static byte slices. The init binary that is now embedded in krun_init is of the static variant allowing a zero-copy use-case. However, if folks supply an init binary via krun_set_init(), it will be treated as owned and will result in a copy.

Later, if we have a native rust api, this will allow embedders of libkrun to use the include_bytes macro and the Static() variant if they want.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'static is currently the correct lifetime, because we exit the process once VM quits, but this will probably change in the future and this may become more unsafe.

I'd think 'static would be correct no matter what, right? It would just over-constrain. However, I don't think you can have non-static lifetimes on global data in Rust (and at least currently there is still a bunch of global state, it seems).

That being said, the most flexibility for users (and, hence, future-proof API design) would perhaps be achieved be some custom enum ala:

enum InitBinary {
  Static(&'static [u8]),
  Dynamic(Arc<[u8]>),
}

Personally I'd probably still err on the side of just using &'static [u8] to force callers to think about what they are doing, but I can't anticipate how this API will be used most frequently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I have tried addressing this in fcd6927 without trying to pretend to own the memory range supplied to the c api by using an internal enum of owned vs static byte slices.

Thanks! Do we still need the LazyLock thingy? Constructing an enum variant with static data in a const context should be fine, but haven't tried removing.

Copy link
Copy Markdown
Collaborator

@mtjhrc mtjhrc Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we misunderstood each other.
By "caller owned" I meant that caller owns the memory (possibly dynamically allocated or their static memory) and guarantees it will exist during the VMM lifetime. In that case krun_set_init doesn't need to copy it (so it actually corresponds to your static case).

Both have pros and cons, I am not really in strongly in favor of one or the other.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "caller owned" I meant that caller owns the memory (possibly dynamically allocated or their static memory) and guarantees it will exist during the VMM lifetime.

Personally, I'd also prefer these semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I did not anticipate that as the default.

In that case, perhaps the union type is not necessary and the ownership requirements can simply be documented on the c api.

It would state that the caller guarantees that the memory will outlive the vm lifetime.

So that leaves some options:

  1. Status quo.
  2. Keep the union type with the expectation that ::Owned() may be helpful for some use-cases of a future rust api _but switch the new c api method to assume ::Static().
  3. Rip out the whole union type and switch everything conceptually to the ::Static().

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Rip out the whole union type and switch everything conceptually to the ::Static().

I'd go for this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...then so shall it be (as of b89d788)!

@ggoodman
Copy link
Copy Markdown
Contributor Author

ggoodman commented Apr 2, 2026

@mtjhrc / @slp is this something you're willing to entertain in 1.x?

@mtjhrc
Copy link
Copy Markdown
Collaborator

mtjhrc commented Apr 7, 2026

Personally, I would be ok with this for 1.18 (last feature release of 1.x branch), because the feature is small and isolated. Please squash the commits and directly introduce just the final simple &'static version only. (instead of the Arc/lazylock/enum sidequests).

@ggoodman
Copy link
Copy Markdown
Contributor Author

ggoodman commented Apr 7, 2026

Please squash the commits and directly introduce just the final simple &'static version only. (instead of the Arc/lazylock/enum sidequests).

Rebased and pushed.

I saw that you merged the crate PR (#588) and I strongly believe that this needs to land in the same version as that or it will result in a breaking change.

Comment on lines +638 to +648
match CTX_MAP.lock().unwrap().entry(ctx_id) {
Entry::Occupied(mut ctx_cfg_entry) => {
let ctx_cfg = ctx_cfg_entry.get_mut();
ctx_cfg.set_init_binary(payload);

for fs_cfg in &mut ctx_cfg.vmr.fs {
fs_cfg.init_payload = Some(payload);
}
}
Entry::Vacant(_) => return -libc::ENOENT,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having to save the init_binary in both the ctx_cfg and fs_cfg, I would suggest this just sets the init payload for fs device with tag "/dev/root".
If such fs device doesn't exist this returns an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtjhrc, I took a stab at making the suggested change and came to the conclusion that the juice is not worth the squeeze.

The main downside is that it would result in krun_set_init becoming order-dependent; embedders would need to have set up their root filesystem before calling krun_set_init. I think these two operations are probably conceptually orthogonal from a user's perspective so the order-dependence creates an unnecessary failure mode.

That being said, I'm not at all an expert on this codebase so maybe I'm not seeing the better approach or I'm projecting constraints that the maintainers don't impose upon themselves.

LMK.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure a flake? I can't re-run the tests myself.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two operations are probably conceptually orthogonal from a user's perspective so the order-dependence creates an unnecessary failure mode

Yeah that is a valid point too. Yeah I guess it's fine currently, we'll change it in libkrun 2.x anyway.

The current API is a mess, some things are order dependent some are not. Ideally this would take a handle to some sort of filesystem device object (some function already work on handles, well IDs e.g. krun_add_console_port_* ), but filesystems creation functions don't return the ID (index).

CI failure a flake? I can't re-run the tests myself.

Yes some sort of race condition causes an ENOMEM in the guest kernel (we had a similar issue before related to host kernel, but this seems different). Unrelated to this PR.

}

fn init_payload(&self) -> io::Result<&[u8]> {
self.cfg.init_payload.ok_or_else(ebadf)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is EBADF really an appropriate error code to return here? I feel like ENOENT is more common for this kind of condition, perhaps ENOMEDIUM if we'd want something more unique.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e129fd6. Will rebase if you confirm that it looks good to you, @d-e-s-o.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as well, thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased. Thanks.


if !status.success() {
panic!("failed to compile init/init.c: {status}");
return None;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably keep the existing behavior: if the build fails that should be signaled with a dedicated error message, in my opinion. That decouples failure from optionality more cleanly, I'd say.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Working on it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rebase this if it looks good. I kept it separate to make it easy for you to review in isolation: 5821070

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased. Thanks.

Comment on lines +239 to +248
#[cfg(not(feature = "tee"))]
fn set_init_binary(&mut self, init_binary: &'static [u8]) {
self.init_payload = Some(init_binary);
}

#[cfg(not(feature = "tee"))]
fn get_init_binary(&self) -> &'static [u8] {
self.init_payload.unwrap_or(DEFAULT_INIT_PAYLOAD)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with the tee feature per-se, but just pointing out that Cargo features should be additive, or problems await. It seems that is violated here, because enabling of a feature removes functionality. Perhaps it's fine if it's a niche thing for a very specific use case (or perhaps there is precedent in the repository already), but just pointing that out.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it violates it here, but unfortunately so do many other places in libkrun code base. Yes this means this will cause problems if multiple crates in the compilation use libkrun and features get unified 🫤 Also I think some function might do something different based on features being enabled. This is something we need to fix, but I don't us being able to do this in libkrun 1.x.

@ggoodman ggoodman force-pushed the feat/init-api branch 2 times, most recently from e129fd6 to ab92f2b Compare April 8, 2026 14:47
@ggoodman
Copy link
Copy Markdown
Contributor Author

ggoodman commented Apr 8, 2026

Need me to rebase off main?

Signed-off-by: Geoffrey Goodman <geoff@goodman.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants