Skip to content

fix(sandbox): add managed loopback proxy#1501

Open
ericksoa wants to merge 3 commits into
mainfrom
fix/managed-loopback-proxy
Open

fix(sandbox): add managed loopback proxy#1501
ericksoa wants to merge 3 commits into
mainfrom
fix/managed-loopback-proxy

Conversation

@ericksoa
Copy link
Copy Markdown
Collaborator

@ericksoa ericksoa commented May 21, 2026

Summary

  • Add an OpenShell-managed sandbox-local loopback HTTP proxy URL exposed as OPENSHELL_LOOPBACK_PROXY_URL.
  • Keep the namespace boundary explicit: the sandbox-netns acceptor thread enters setns(), binds 127.0.0.1:<port>, accepts client sockets, and hands accepted sockets out.
  • Keep policy, DNS, SSRF checks, TLS/L7 handling, WebSocket text rewrite, credential resolution, and upstream TCP dialing in the supervisor dispatcher that uses the existing proxy path.
  • Keep HTTP_PROXY/HTTPS_PROXY behavior unchanged for ordinary proxy-aware clients, and keep the implementation generic with no Discord-specific routing or credential logic.

Feedback Addressed

  • John called out that only the loopback listener should live in the sandbox network namespace; DNS, policy handling, credential rewrite, and upstream dialing need to run from supervisor-side code.
  • The implementation now names and documents that split directly in crates/openshell-sandbox/src/proxy.rs: sandbox-netns acceptor vs. supervisor dispatcher.
  • Added e2e/rust/tests/loopback_proxy_netns.rs to prove the boundary behavior: a direct sandbox socket to the host-side target is rejected, while CONNECT through OPENSHELL_LOOPBACK_PROXY_URL reaches the same target successfully.
  • The concrete downstream consumer remains fix(openclaw): prefer OpenShell loopback proxy NemoClaw#4005: fix(openclaw): prefer OpenShell loopback proxy NemoClaw#4005

Validation

  • cargo fmt --all -- --check
  • rustfmt --edition 2024 --check e2e/rust/tests/loopback_proxy_netns.rs
  • git diff --check
  • cargo check -p openshell-sandbox
  • cargo clippy -p openshell-core -p openshell-sandbox --all-targets -- -D warnings
  • cargo clippy --manifest-path e2e/rust/Cargo.toml --features e2e-docker --test loopback_proxy_netns -- -D warnings
  • cargo test -p openshell-sandbox apply_loopback_proxy_env_exposes_managed_url_without_changing_proxy_vars --lib
  • cargo test -p openshell-sandbox websocket --lib
  • cargo test -p openshell-sandbox --test websocket_upgrade
  • Colima Docker E2E: DOCKER_HOST=unix://$HOME/.colima/default/docker.sock TMPDIR=<workspace tmp> PREBUILT_AUTO_STAGE=0 OPENSHELL_E2E_DOCKER_TEST=loopback_proxy_netns bash e2e/rust/e2e-docker.sh
  • Colima Docker E2E: DOCKER_HOST=unix://$HOME/.colima/default/docker.sock TMPDIR=<workspace tmp> PREBUILT_AUTO_STAGE=0 OPENSHELL_E2E_DOCKER_TEST=websocket_conformance bash e2e/rust/e2e-docker.sh

Notes:

  • cargo fmt --manifest-path e2e/rust/Cargo.toml --all -- --check currently reports pre-existing rustfmt drift across the standalone e2e crate, so the new E2E file was checked directly with rustfmt.
  • This local macOS/Colima machine cannot build the Linux arm64 supervisor without zig/cargo-zigbuild; the Colima runs used an existing Linux arm64 supervisor prebuilt from the prior PR head. The new head changes the loopback implementation names/comments and adds the proof test; CI should build the current-head supervisor normally.

@johntmyers
Copy link
Copy Markdown
Collaborator

@ericksoa I see this maps back to something for Discord in NemoClaw but can you expand on the requirement a bit more and the use case?

@ericksoa
Copy link
Copy Markdown
Collaborator Author

Good question. The OpenShell requirement here is not Discord-specific; Discord/OpenClaw is just the first concrete client that exposed the gap.

The generic use case is: a sandboxed app or SDK needs to use an HTTP CONNECT proxy, but the app will only accept a loopback proxy URL such as http://127.0.0.1:<port>. OpenShell already injects and owns the real gateway proxy path, but that address is not always accepted by application-level proxy config validators. In those cases we need a sandbox-local loopback adapter that still forwards into the normal OpenShell L7 proxy path, rather than making every integration ship its own little localhost proxy.

The Discord case is a good example because OpenClaw's Discord Gateway client ignores HTTP_PROXY/HTTPS_PROXY and uses the per-account Discord proxy config instead. That config only accepts loopback proxy URLs. At the same time, the Gateway path is a WebSocket path where OpenShell must remain in the middle so policy, the allowed 101 upgrade, WEBSOCKET_TEXT, and credential rewrite for the IDENTIFY payload all still happen centrally in OpenShell. Without this managed loopback listener, NemoClaw has to provide a temporary transport shim just to adapt 127.0.0.1:<port> back to the OpenShell proxy.

So the intended OpenShell contract is:

  • expose a lifecycle-bound sandbox-local proxy URL, currently via OPENSHELL_LOOPBACK_PROXY_URL;
  • bind it only on loopback inside the sandbox network namespace;
  • forward accepted sockets into the existing OpenShell proxy implementation;
  • keep policy, TLS handling, WebSocket handling, credential rewrite, and upstream dialing in the normal OpenShell path;
  • avoid any Discord-specific logic in OpenShell.

That gives clients with strict localhost proxy requirements a standard OpenShell affordance, while keeping the security boundary and protocol handling in one place.

@ericksoa
Copy link
Copy Markdown
Collaborator Author

For the concrete downstream context, the dependent NemoClaw PR is NVIDIA/NemoClaw#4005: NVIDIA/NemoClaw#4005

That PR consumes this OpenShell affordance by preferring OPENSHELL_LOOPBACK_PROXY_URL when generating the OpenClaw Discord account config, while keeping NemoClaw's existing helper as a compatibility fallback until NemoClaw can raise its minimum OpenShell pin past the release that includes this feature.

@ericksoa ericksoa self-assigned this May 21, 2026
@johntmyers
Copy link
Copy Markdown
Collaborator

Ok, understood on the use case. So the listener is placed on the sandbox loopback but then after setns() gets called, the thread stays there so the DNS lookup and upstream dial are attempting to happen from inside the restricted sandbox-side namespace. I would think the NFT rules are gonna see that as direct outbound traffic and block it.

The better impl is to keep only the loopback listener in the sandbox network namespace and then hand the accepted client socket back to code that's running in the supervisor network namesapce before DNS, policy handling, cred rewrite and upstream dialing.

@ericksoa
Copy link
Copy Markdown
Collaborator Author

Thanks, agreed. I updated the PR to encode that model directly:

  • the sandbox-netns acceptor thread enters setns(), binds loopback, accepts client sockets, and hands those accepted sockets out;
  • the supervisor dispatcher receives those sockets and runs the normal proxy path for DNS, policy handling, SSRF checks, TLS/L7, WebSocket text rewrite, credential resolution, and upstream dialing.

I also added a focused E2E proof: it starts a host-side test server, applies a policy allowing that host/port through the proxy, verifies a direct sandbox socket to the same target is refused, then verifies CONNECT through OPENSHELL_LOOPBACK_PROXY_URL reaches it successfully. I reran the WebSocket credential-rewrite E2E through the managed loopback path as well.

The downstream consumer remains NVIDIA/NemoClaw#4005.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants