feat(vm-driver): Support additional virtiofs shares for directory volume mounting in MicroVM sandboxes

### Problem Statement

The VM driver (`openshell-driver-vm`) on macOS ARM64 intentionally restricts virtiofs to a **single device** — the sandbox rootfs. We understand this is a deliberate security decision: the upload/download mechanism through the gateway is the intended path for host file access, keeping the sandbox isolated from the host filesystem by design.

We'd like to propose an option to relax this restriction for directory volume mounting in MicroVM sandboxes — specifically for local, single-user desktop scenarios where the sandbox and host are the same physical machine.

**Why this matters for long-lived sandbox sessions:**

The `sandbox upload`/`sandbox download` path works well for small, discrete file transfers. But for long-lived sandbox sessions used for computer use — where the sandbox needs ongoing access to the user's files — the batch-copy model has significant limitations:

- **Large working directories** — A user's home directory or workspace can easily be hundreds of gigabytes. Uploading the entire tree before the sandbox can read a single file isn't practical, and the caller often doesn't know which files will be needed upfront.
- **No incremental access** — Tools can't `cat` one file without pre-staging the containing directory. There's no on-demand, file-at-a-time fetch that's transparent to the processes running in the sandbox.
- **Stale data** — Once uploaded, the sandbox holds a point-in-time snapshot. If the user edits files on the host during the session (which happens regularly in long-lived sessions), the sandbox doesn't see those changes. There's no live view.
- **Bidirectional sync complexity** — After files are modified inside the sandbox, determining which files changed and downloading them back requires diffing or transferring everything. This becomes the caller's problem to solve.
- **Double the storage** — Uploaded files exist on the host and in the sandbox rootfs. Since the sandbox rootfs is typically memory-backed, large uploads consume the VM's memory budget.

A native virtiofs mount would eliminate all of these — the VM gets a live, transparent window into a host directory. No copies, no staging, no sync logic. Standard POSIX filesystem operations, zero-copy.

We acknowledge this is a tradeoff — virtiofs grants broader filesystem access than the upload/download model. The question is whether the MicroVM driver's security posture (hypervisor isolation, Landlock enforcement) makes that tradeoff viable in ways that the K3s model couldn't. We think it's worth exploring, and we'd appreciate the team's perspective.

### Proposed Design

### Core ask: opt-in directory volume mounting in MicroVM mode

We're requesting an opt-in mechanism to allow additional virtiofs mounts in MicroVM sandboxes for local desktop use. We recognize this is currently restricted for good security reasons, and we're not proposing it be the default behavior — this would be an explicit, policy-controlled option.

The public `containers/libkrun` supports multiple virtiofs devices on Linux. NVIDIA's fork intentionally restricts this to a single device for security. We're asking whether this restriction could be relaxed as an opt-in capability, gated by policy, for local MicroVM deployments.

### Interface layers

The directory volume capability could surface at multiple levels of the stack — each complementary, serving a different user:

**Driver-level — `platform_config` passthrough:**

`DriverSandboxTemplate` already has an opaque `platform_config` Struct field passed to the driver. The VM driver could accept a `volume_mounts` key with environment variable resolution:

```json
{
  "volume_mounts": [
    {"directory": "$HOME", "mount_tag": "home", "guest_path": "/home/user"},
    {"directory": "/Users/dev/workspace", "mount_tag": "workspace", "guest_path": "/workspace"}
  ]
}
```

The driver resolves `$HOME` (or any env var) at sandbox creation time, calls `krun_add_virtiofs2` for each entry, and the sandbox init script mounts the tagged devices at `guest_path`. This requires no changes to the public gRPC API.

**Gateway-level — CLI flag:**

```bash
# Mount a specific directory
openshell-gateway --drivers vm --vm-volume /Users/dev/workspace:/workspace

# Mount using $HOME — resolved at gateway startup
openshell-gateway --drivers vm --vm-volume $HOME:/home/user
```

The flag name `--vm-volume` mirrors Docker/Podman conventions (`-v`) without implying a specific source type.

**API-level — proto extension:**

A `volume_mounts` field on `SandboxTemplate`, similar to how `volume_claim_templates` works for K8s:

```protobuf
message VolumeMount {
  string directory = 1;      // source directory path (supports $HOME, $WORKSPACE, etc.)
  string guest_path = 2;     // mount point inside the sandbox
  bool read_only = 3;
  string mount_tag = 4;      // virtiofs tag (auto-generated if empty)
}

message SandboxTemplate {
  // ... existing fields ...
  repeated VolumeMount volume_mounts = N;
}
```

These are complementary interfaces to the same underlying capability — multi-device virtiofs in the VM driver.

### Filesystem access control

Even with a virtiofs mount, the sandbox doesn't need unrestricted access. The mounted directory can be scoped using existing Linux security mechanisms inside the VM:

- **Landlock LSM** — restrict the mounted path to a specific subtree (e.g., only `/home/user/workspace`), with configurable read/write/execute permissions. This is the same enforcement model the sandbox already uses for its rootfs.
- **Path guard policies** — an allowlist of accessible paths within the mount, with a denylist for sensitive locations (`.ssh/`, `.aws/`, `.kube/`, `.gnupg/`, browser profiles, credential stores). The caller defines the policy, the sandbox enforces it.
- **Read-only default** — the `VolumeMount` proto could default `read_only = true`, requiring explicit opt-in for write access.

The sandbox is *contained but not blind* — tools can see and operate on the user's files without being able to reach sensitive host directories. The hypervisor boundary provides the hard isolation; Landlock provides the fine-grained scoping within the mount.

### Alternatives Considered

### 1. Upload/download via gateway (current workaround suggested in #500)

As described in the problem statement, `sandbox upload`/`sandbox download` works for small file transfers but breaks down for long-lived computer use sessions. No incremental access, stale data, bidirectional sync complexity, and double storage.

### 2. SSH/SFTP bridge (our current production workaround)

We embedded a lightweight SSH/SFTP server (Rust `russh` crate) in our application that runs on the host. The sandbox connects to it via the OpenShell network proxy. This provides policy-enforced host file access with:
- Path jail to `$HOME` only
- Command allowlist (cat, ls, head, grep, stat, mkdir, cp, mv, echo, touch, etc.)
- Sensitive-path denylist (`.ssh`, `.aws`, `.kube`, `.env`, browser profiles)

This works but adds ~100ms latency per file operation and significant architectural complexity vs. a native virtiofs mount. It also doesn't provide POSIX filesystem semantics — tools like `git`, `npm`, and build systems expect a real filesystem, not an SFTP channel.

### 3. FUSE-based approach (mentioned in #500 by @sandys)

Using `/dev/fuse` inside the sandbox to mount a user-space filesystem. This would require device passthrough (`--device /dev/fuse`), seccomp allowlist changes for `mount()`/`umount2()`, and a FUSE daemon implementation. More complex than virtiofs and adds another trust boundary.

### Why virtiofs is the right mechanism here

The rootfs is *already* virtiofs-mounted — a second device uses the exact same trust model and code path. It's not introducing a new mechanism; it's extending one that's already proven and running in production. The hypervisor mediates the access, and the VM kernel handles caching and coherency natively.

### Agent Investigation

### libkrun virtiofs API

The upstream [`containers/libkrun`](https://github.com/containers/libkrun) exposes a family of functions for mounting host directories into the guest VM via virtiofs ([`include/libkrun.h`](https://github.com/containers/libkrun/blob/main/include/libkrun.h)):

```c
// Mount a host directory into the guest, identified by a tag
int32_t krun_add_virtiofs(uint32_t ctx_id,
                          const char *c_tag,    // tag to identify the mount in the guest
                          const char *c_path);  // host directory to expose

// + DAX shared memory window size for faster I/O
int32_t krun_add_virtiofs2(uint32_t ctx_id,
                           const char *c_tag,
                           const char *c_path,
                           uint64_t shm_size);

// + read-only flag (most complete variant)
int32_t krun_add_virtiofs3(uint32_t ctx_id,
                           const char *c_tag,
                           const char *c_path,
                           uint64_t shm_size,
                           bool read_only);
```

These can be called **multiple times** to add multiple virtiofs devices to a single VM context. The rootfs is mounted as a virtiofs device (tagged `/dev/root`), and additional shares use custom tags (e.g., `homefs`). In upstream libkrun on macOS ARM64, multiple devices work correctly.

### Observed behavior (macOS ARM64, OpenShell v0.0.40)

When attempting to add a second virtiofs device to an existing sandbox configuration:

```bash
# Environment
# - macOS 15.x, Apple Silicon (M1/M2/M3/M4)
# - Driver: openshell-driver-vm (libkrun, Apple Hypervisor.framework)
# - Kernel inside VM: Linux 6.12 aarch64
# - Root filesystem: virtiofs (/dev/root on / type virtiofs) — works correctly

# 1. Confirm sandbox boots and rootfs virtiofs works
openshell sandbox create --image sandbox:v1
openshell sandbox exec -- mount | grep virtiofs
# Output: /dev/root on / type virtiofs (rw,relatime)

# 2. Attempt to add a second virtiofs device in the driver's FFI layer
# In openshell-driver-vm ffi.rs:
#   let ret = krun_add_virtiofs2(ctx_id, c_tag.as_ptr(), c_path.as_ptr());
#   // ret == 0 (success)
#   let ret = krun_start_enter(ctx_id);
#   // ret == -22 (EINVAL) — VM fails to start

# 3. Tested with multiple host paths (/Users/$USER, /tmp) — same EINVAL, ruling out TCC/permissions
```

### Codebase findings

- The `krun_add_virtiofs2` **symbol exists** in the bundled `libkrun.dylib` — the upstream API supports multiple virtiofs devices, but NVIDIA's fork intentionally restricts it to a single device (the rootfs)
- This is a **deliberate security decision**, not a bug — the upload/download mechanism through the gateway is the intended path for host file access, keeping the sandbox isolated from the host filesystem by design
- The rootfs uses virtiofs (`rootfstype=virtiofs rw` in kernel cmdline), so the driver and guest kernel both support the mechanism — the restriction is enforced at the VM configuration layer
- NVIDIA's macOS libkrun fork adds `krun_add_net_unixgram` for macOS gvproxy networking, which doesn't exist in the upstream `containers/libkrun` — we cannot swap in the upstream build
- We have a proof-of-concept showing the client-side FFI changes, but the restriction would need to be relaxed in the driver itself

### Security model comparison (MicroVM vs K8s)

This is related to [#500](https://github.com/NVIDIA/OpenShell/issues/500), which was closed with valid security concerns about K8s `hostPath` volumes. We're raising it again specifically for the MicroVM driver, where the isolation model differs:

| Concern | K8s mode | MicroVM mode |
|---|---|---|
| **Isolation boundary** | Shared kernel — `hostPath` is a known container escape vector | Hypervisor-isolated — VM has its own kernel, virtiofs is mediated by Apple Virtualization.framework |
| **Deployment scope** | K8s pods can run on remote nodes in a cluster | MicroVM via libkrun is inherently local — only runs on the user's physical machine |
| **Existing precedent** | `hostPath` introduces a new trust boundary | The rootfs is *already* virtiofs-mounted — a second device uses the same mechanism and trust model |
| **Cloud implications** | Exposing host paths limits cloud portability | No cloud path exists for MicroVM mode — this is a local-machine-only driver |

We understand if the answer is still the same — but we wanted to present the MicroVM-specific context for the team's consideration.

## Related Issues

- [#500](https://github.com/NVIDIA/OpenShell/issues/500) — `feat: Support host directory mounts (hostPath/PVC) via CLI` — Same fundamental need, closed with the recommendation to explore upload/download improvements. We're raising the MicroVM-specific angle where the isolation model and deployment scope differ from K8s.
- [#1356](https://github.com/NVIDIA/OpenShell/issues/1356) — `openshell-driver-vm 0.0.39 on macOS: sandbox init's chown -R over virtio-fs fails` — Related virtiofs issue on macOS
- [#1268](https://github.com/NVIDIA/OpenShell/issues/1268) — `feat: inject read-only files into sandbox at creation time` — Adjacent file injection mechanism

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vm-driver): Support additional virtiofs shares for directory volume mounting in MicroVM sandboxes #1509

Problem Statement

Proposed Design

Core ask: opt-in directory volume mounting in MicroVM mode

Interface layers

Filesystem access control

Alternatives Considered

1. Upload/download via gateway (current workaround suggested in #500)

2. SSH/SFTP bridge (our current production workaround)

3. FUSE-based approach (mentioned in #500 by @sandys)

Why virtiofs is the right mechanism here

Agent Investigation

libkrun virtiofs API

Observed behavior (macOS ARM64, OpenShell v0.0.40)

Codebase findings

Security model comparison (MicroVM vs K8s)

Related Issues

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Concern	K8s mode	MicroVM mode
Isolation boundary	Shared kernel — `hostPath` is a known container escape vector	Hypervisor-isolated — VM has its own kernel, virtiofs is mediated by Apple Virtualization.framework
Deployment scope	K8s pods can run on remote nodes in a cluster	MicroVM via libkrun is inherently local — only runs on the user's physical machine
Existing precedent	`hostPath` introduces a new trust boundary	The rootfs is already virtiofs-mounted — a second device uses the same mechanism and trust model
Cloud implications	Exposing host paths limits cloud portability	No cloud path exists for MicroVM mode — this is a local-machine-only driver

feat(vm-driver): Support additional virtiofs shares for directory volume mounting in MicroVM sandboxes #1509

Description

Problem Statement

Proposed Design

Core ask: opt-in directory volume mounting in MicroVM mode

Interface layers

Filesystem access control

Alternatives Considered

1. Upload/download via gateway (current workaround suggested in #500)

2. SSH/SFTP bridge (our current production workaround)

3. FUSE-based approach (mentioned in #500 by @sandys)

Why virtiofs is the right mechanism here

Agent Investigation

libkrun virtiofs API

Observed behavior (macOS ARM64, OpenShell v0.0.40)

Codebase findings

Security model comparison (MicroVM vs K8s)

Related Issues

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions