Agent Diagnostic
Investigated the docker driver's gateway route selection in crates/openshell-driver-docker/src/lib.rs:
uses_host_gateway_alias() (line 1280) detects Docker Desktop, Colima, Lima, Rancher Desktop, and OrbStack via daemon OS string, hostname, and labels
- Podman with libkrun on macOS reports
os: "linux", hostname: "localhost.localdomain", no matching labels. None of the detection patterns match
docker_gateway_route() falls through to Bridge mode, setting bind_address and host_alias_ip to the bridge gateway IP (10.89.0.1)
docker_extra_hosts() maps host.openshell.internal to 10.89.0.1, which is injected into /etc/hosts inside the container
- On macOS/libkrun,
10.89.0.1 exists only inside the VM's network namespace and is not routable from containers
- The supervisor inside the sandbox tries to connect to
https://host.openshell.internal:17670/ which resolves to 10.89.0.1, and fails with failed to connect to OpenShell server
- Verified: containers CAN reach the macOS host via
host.containers.internal (192.168.127.254) using openssl s_client -connect 192.168.127.254:17670
- The podman driver (
crates/openshell-driver-podman) handles this correctly: it auto-detects host.containers.internal for the gRPC endpoint (driver.rs line 232) and uses podman's host-gateway for hostadd
- The docker driver's detection logic has no case for podman-backed runtimes on macOS
Description
When using the Homebrew-installed OpenShell 0.0.46 on macOS with Podman (libkrun VM), sandboxes get stuck in Provisioning forever. The supervisor cannot connect back to the gateway because the docker driver resolves host-gateway to the bridge subnet gateway IP (10.89.0.1), which is only accessible inside the libkrun VM's network namespace, not from within containers.
Expected: The docker driver should detect Podman/libkrun on macOS and use host.containers.internal (or host-gateway correctly) to route supervisor callbacks to the gateway.
Reproduction Steps
- macOS with Podman (libkrun), Homebrew-installed OpenShell 0.0.46
openshell sandbox create --from ghcr.io/nvidia/openshell-community/sandboxes/base:latest
- Sandbox stays in
Provisioning. Container logs show Policy fetch failed / failed to connect to OpenShell server in a loop
- Inside the container,
/etc/hosts maps host.openshell.internal to 10.89.0.1 (unreachable)
host.containers.internal (192.168.127.254) is reachable and the gateway responds on that IP
Environment
- OS: macOS 26.5 (Darwin 25.5.0), arm64
- Podman: 5.8.2 (VM type: libkrun)
- OpenShell: 0.0.46 (Homebrew)
- Gateway config: default (
bind_address = 127.0.0.1:17670, [openshell.drivers.docker])
Logs
# Supervisor logs from inside the container:
openshell_sandbox: Policy fetch failed, retrying
openshell: log push connect failed: failed to connect to OpenShell server
# Repeats every ~2s until the container crashes
# /etc/hosts inside container:
10.89.0.1 host.docker.internal
10.89.0.1 host.openshell.internal # <-- unreachable from container
192.168.127.254 host.containers.internal # <-- reachable, but not used
Workaround
# ~/.config/openshell/gateway.toml
[openshell.gateway]
bind_address = "0.0.0.0:17670"
[openshell.drivers.docker]
host_gateway_ip = "192.168.127.254"
# ~/.config/openshell/gateway.env
OPENSHELL_BIND_ADDRESS=0.0.0.0
Suggested Fix
uses_host_gateway_alias() in the docker driver should detect podman-backed runtimes. Podman's Docker-compatible API reports os: "linux" and hostname: "localhost.localdomain" with no identifying labels, but the connection comes through a podman socket. The runtime could be detected by checking for podman-specific API headers, the socket path, or system info fields like conmon_version that only podman exposes.
Alternatively, the driver could probe connectivity to the bridge gateway IP before using it, falling back to host-gateway if the probe fails.
Agent-First Checklist
Agent Diagnostic
Investigated the docker driver's gateway route selection in
crates/openshell-driver-docker/src/lib.rs:uses_host_gateway_alias()(line 1280) detects Docker Desktop, Colima, Lima, Rancher Desktop, and OrbStack via daemon OS string, hostname, and labelsos: "linux",hostname: "localhost.localdomain", no matching labels. None of the detection patterns matchdocker_gateway_route()falls through toBridgemode, settingbind_addressandhost_alias_ipto the bridge gateway IP (10.89.0.1)docker_extra_hosts()mapshost.openshell.internalto10.89.0.1, which is injected into/etc/hostsinside the container10.89.0.1exists only inside the VM's network namespace and is not routable from containershttps://host.openshell.internal:17670/which resolves to10.89.0.1, and fails withfailed to connect to OpenShell serverhost.containers.internal(192.168.127.254) usingopenssl s_client -connect 192.168.127.254:17670crates/openshell-driver-podman) handles this correctly: it auto-detectshost.containers.internalfor the gRPC endpoint (driver.rs line 232) and uses podman'shost-gatewayforhostaddDescription
When using the Homebrew-installed OpenShell 0.0.46 on macOS with Podman (libkrun VM), sandboxes get stuck in
Provisioningforever. The supervisor cannot connect back to the gateway because the docker driver resolveshost-gatewayto the bridge subnet gateway IP (10.89.0.1), which is only accessible inside the libkrun VM's network namespace, not from within containers.Expected: The docker driver should detect Podman/libkrun on macOS and use
host.containers.internal(orhost-gatewaycorrectly) to route supervisor callbacks to the gateway.Reproduction Steps
openshell sandbox create --from ghcr.io/nvidia/openshell-community/sandboxes/base:latestProvisioning. Container logs showPolicy fetch failed/failed to connect to OpenShell serverin a loop/etc/hostsmapshost.openshell.internalto10.89.0.1(unreachable)host.containers.internal(192.168.127.254) is reachable and the gateway responds on that IPEnvironment
bind_address = 127.0.0.1:17670,[openshell.drivers.docker])Logs
Workaround
# ~/.config/openshell/gateway.env OPENSHELL_BIND_ADDRESS=0.0.0.0Suggested Fix
uses_host_gateway_alias()in the docker driver should detect podman-backed runtimes. Podman's Docker-compatible API reportsos: "linux"andhostname: "localhost.localdomain"with no identifying labels, but the connection comes through a podman socket. The runtime could be detected by checking for podman-specific API headers, the socket path, or system info fields likeconmon_versionthat only podman exposes.Alternatively, the driver could probe connectivity to the bridge gateway IP before using it, falling back to
host-gatewayif the probe fails.Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)