From d87ed9a6435961079b997b7723788bf3ae321954 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:02:13 +0000 Subject: [PATCH 1/8] docs: add agent guidance --- .agents/docs | 1 + .claude/docs/BOUNDARY_AGENT_GUIDE.md | 428 +++++++++++++++++++++++++++ AGENTS.md | 58 ++++ CLAUDE.md | 1 + e2e_tests/AGENTS.md | 69 +++++ 5 files changed, 557 insertions(+) create mode 120000 .agents/docs create mode 100644 .claude/docs/BOUNDARY_AGENT_GUIDE.md create mode 100644 AGENTS.md create mode 120000 CLAUDE.md create mode 100644 e2e_tests/AGENTS.md diff --git a/.agents/docs b/.agents/docs new file mode 120000 index 0000000..daf0269 --- /dev/null +++ b/.agents/docs @@ -0,0 +1 @@ +../.claude/docs \ No newline at end of file diff --git a/.claude/docs/BOUNDARY_AGENT_GUIDE.md b/.claude/docs/BOUNDARY_AGENT_GUIDE.md new file mode 100644 index 0000000..9f3a6b6 --- /dev/null +++ b/.claude/docs/BOUNDARY_AGENT_GUIDE.md @@ -0,0 +1,428 @@ +# Boundary agent guide + +This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. + +## Repository map + +| Path | Purpose | +|------|---------| +| `cmd/boundary/main.go` | Binary entrypoint. Creates the CLI command and exits with errors. | +| `cli/` | Serpent CLI, flags, environment variables, YAML config loading, privilege gate. | +| `config/` | App config, user info, session correlation config, header names. | +| `run/` | Platform dispatch. Linux runs a jail backend, non-Linux returns unsupported. | +| `proxy/` | HTTP and HTTPS filtering proxy, CONNECT support, TLS detection, audit, session correlation. | +| `rulesengine/` | Allow-rule parser and matcher. Default-deny policy. | +| `audit/` | Log auditor, socket auditor, multi-auditor, sequence counter. | +| `tls/` | Local CA creation/loading and per-host certificate generation. | +| `nsjail_manager/` | Default Linux namespace backend. Parent and child process orchestration. | +| `nsjail_manager/nsjail/` | Low-level veth, iptables, dummy DNS, env, and command runner code. | +| `landjail/` | Landlock backend using proxy env vars rather than transparent iptables routing. | +| `privilege/` | Linux privilege escalation through `sudo` and `setpriv`; non-Linux stubs. | +| `dnsdummy/` | Dummy DNS server used to prevent DNS exfiltration in namespace mode. | +| `log/` | slog setup to stderr or files. | +| `e2e_tests/` | Linux sudo tests that can mutate host networking. | +| `.github/workflows/` | CI, build, and release workflows. | + +## Architecture overview + +Boundary runs a target command in a restricted environment and sends its HTTP and HTTPS traffic through a local filtering proxy. Requests are evaluated against allow rules. Anything that does not match an allow rule is denied. + +Core concepts: + +- Default deny: no rule means no outbound HTTP or HTTPS request is allowed. +- Parent process: sets up proxying, audit, TLS, and jail infrastructure. +- Child process: runs inside the selected jail backend and executes the target command. +- Proxy: parses requests, evaluates allow rules, audits decisions, forwards allowed traffic, and blocks denied traffic. +- Auditor: logs every request decision to stderr and optionally to the Coder workspace-agent socket. +- TLS manager: creates a local CA and per-host certificates so HTTPS can be inspected. + +Boundary has two jail backends: + +- `nsjail`: default. Uses Linux network namespaces, veth pairs, iptables NAT and REDIRECT rules, and optional user namespaces. +- `landjail`: uses Landlock network restrictions. It relies on proxy environment variables instead of transparent iptables redirection. + +## Runtime flow + +High-level flow: + +1. `cmd/boundary/main.go` calls `cli.NewCommand(version)`. +2. `cli/cli.go` parses flags, environment variables, and optional YAML config into `config.CliConfig`. +3. `config.NewAppConfigFromCliConfig` builds `config.AppConfig` and validates session-correlation config. +4. If jail type is `nsjail`, `privilege.EnsurePrivileges()` re-execs through `sudo` and `setpriv` when needed. +5. `run.Run` generates a boundary session UUID and dispatches to `nsjail_manager.Run` or `landjail.Run`. +6. The selected backend decides whether the current process is a parent or child by checking `CHILD=true`. +7. The parent parses allow rules, builds the rule engine, sets up auditors, creates TLS config, starts the proxy, then starts the child process. +8. The child applies jail-specific network setup and runs the target command. +9. The proxy evaluates each HTTP or HTTPS request and audits the result. +10. The parent stops the proxy and cleans up host resources when the target command exits or a signal is received. + +## CLI and config + +The CLI is built with `github.com/coder/serpent` in `cli/cli.go`. + +Important config types: + +- `config.CliConfig`: serpent values for flags, environment variables, and YAML. +- `config.AppConfig`: runtime config passed into the jail backend and proxy setup. +- `config.SessionCorrelationConfig`: controls session-correlation header injection. +- `config.UserInfo`: resolves the effective user, including sudo scenarios. + +Important CLI behavior: + +- `--allow` is repeatable and CLI-only. +- YAML `allowlist` is merged with CLI `--allow` rules. +- `--jail-type` defaults to `nsjail`. +- `--use-real-dns` intentionally permits DNS exfiltration. Do not enable it by accident. +- `--disable-audit-logs` disables workspace-agent socket forwarding. It does not remove stderr logging. +- `--enable-session-correlation` requires configured inject targets or a valid fallback from `CODER_AGENT_URL`. +- `--log-proxy-socket-path` defaults to the Coder workspace-agent boundary log proxy socket path. + +When changing CLI flags: + +- Update README usage if behavior changes. +- Add or update config tests if parsing or validation changes. +- Check environment variable names. Some are shared with the Coder workspace agent. +- Preserve backwards compatibility unless the task explicitly allows breaking it. + +## Rules engine + +`rulesengine/` parses and evaluates allow rules. + +Rule grammar uses key-value tokens: + +```text +method=GET,POST domain=github.com path=/api/* +``` + +Supported keys: + +- `method`: one or more HTTP token values, comma-separated. `*` matches all methods. +- `domain`: hostname pattern. `*` can be a full label. +- `path`: one or more path patterns, comma-separated. + +Important matching semantics: + +- No matching allow rule means denied. +- `domain=github.com` matches only `github.com`. +- `domain=github.com` does not match `api.github.com`. +- `domain=*.github.com` matches subdomains like `api.github.com`. +- `domain=*.github.com` does not match the base domain `github.com`. +- To allow both a base domain and its subdomains, use two rules. +- Path wildcards are segment-based. A wildcard must be the entire segment. +- A path pattern ending in `*` can match additional path segments. + +When changing rule parsing or matching: + +- Update parser tests in `rulesengine/`. +- Update matcher tests in `rulesengine/`. +- Update README examples if user-visible behavior changes. +- Be careful with percent-encoded paths. Proxy forwarding preserves `RawPath` for cases like scoped npm package names. + +## Proxy + +`proxy/` contains the filtering proxy. It handles both transparent proxy traffic and explicit HTTP proxy traffic. + +Main files: + +- `proxy/proxy.go`: server lifecycle, TLS detection, HTTP and HTTPS processing, forwarding, block responses. +- `proxy/connect.go`: HTTP CONNECT tunnel support. +- `proxy/*_test.go`: proxy tests and framework. + +Request handling paths: + +1. Transparent HTTP: connection is not TLS, request is read directly, then evaluated. +2. Transparent HTTPS: first byte looks like TLS, boundary terminates TLS with a generated certificate, reads the HTTP request, then evaluates it. +3. Explicit HTTP proxy: client sends an absolute URL in the HTTP request. +4. Explicit HTTPS proxy: client sends CONNECT, boundary establishes a TLS tunnel, then reads HTTP requests inside the tunnel. + +Important proxy behavior: + +- Every request is audited before allow or deny handling completes. +- Audit sequence numbers are per proxy server instance and come from `audit.SequenceCounter`. +- Denied requests get a 403 response with suggested allow rules. +- Allowed requests are forwarded with a new upstream request. +- For GET and HEAD, forwarded request bodies are set to nil. +- Upstream responses are read fully so `Content-Length` can be set explicitly. +- Responses are normalized to HTTP/1.1 before writing back to the downstream client. +- Optional session-correlation headers are injected only when the request URL matches configured inject targets. + +When changing proxy behavior: + +- Prefer unit tests with `proxy/proxy_framework_test.go` and `httptest`. +- Avoid live network tests unless the behavior truly requires it. +- Test both allow and deny paths. +- Test both transparent and CONNECT paths when TLS behavior changes. +- Preserve audit behavior for both allowed and denied requests. + +## Audit + +`audit/` provides request auditing. + +Key types: + +- `audit.Request`: request decision payload. +- `audit.Auditor`: interface implemented by all auditors. +- `audit.LogAuditor`: writes structured logs through slog. +- `audit.SocketAuditor`: batches and forwards logs to the Coder workspace-agent socket. +- `audit.MultiAuditor`: fans out to multiple auditors. +- `audit.SequenceCounter`: atomic counter for per-request sequence numbers. + +Important behavior: + +- `SetupAuditor` always includes the log auditor. +- Socket forwarding is skipped when audit logs are disabled, the socket path is empty, or the socket does not exist. +- Socket auditor queues logs, batches them, retries connection failures, and reports drops. +- Allowed audit entries include the matching rule. +- Denied audit entries do not include a rule. +- Sequence numbers start at zero. + +When changing audit behavior: + +- Check `audit/socket_auditor_test.go` for batching, retry, drop, shutdown, and session ID expectations. +- Preserve the Coder boundary log proxy codec contract. +- Avoid blocking request handling on slow socket forwarding. + +## TLS + +`tls/` generates and loads certificates used for TLS interception. + +Key behavior: + +- A local CA is stored in the user's boundary config directory. +- Existing CA files are reused when possible. +- Per-host server certificates are generated on demand. +- The CA path is injected into child process environments so tools can trust boundary's generated certificates. + +When changing TLS behavior: + +- Preserve file ownership for the original user when running through sudo. +- Be careful with config directory paths from `config.UserInfo`. +- Consider the impact on curl, git, Python requests, and Node clients. +- Avoid broad certificate trust changes without explicit review. + +## nsjail backend + +`nsjail_manager/` is the default backend. + +Parent flow: + +1. Parse allow rules. +2. Build rule engine. +3. Set up audit. +4. Set up TLS and write CA certificate. +5. Create `nsjail.LinuxJail`. +6. Start the proxy. +7. Launch a child boundary process with `CHILD=true`. +8. Configure host-to-namespace communication after child PID exists. +9. Wait for child exit or signal. +10. Stop proxy and clean up iptables and veth state. + +Child flow: + +1. Wait for the jail-side veth interface. +2. Configure namespace networking. +3. Start dummy DNS and redirect DNS unless `--use-real-dns` is enabled. +4. Run the target command. + +Low-level networking behavior: + +- Host-side address: `192.168.100.1/24`. +- Jail-side address: `192.168.100.2/24`. +- Fixed subnet: `192.168.100.0/24`. +- TCP traffic from the jail is redirected to the local HTTP proxy with iptables. +- Non-TCP forwarding rules allow return traffic for non-TCP flows. +- Dummy DNS prevents DNS exfiltration by redirecting DNS to local dummy responses. + +High-risk details: + +- Interface names are constrained by Linux's 15-character interface name limit. +- iptables cleanup must mirror setup rules. +- `--no-user-namespace` changes clone flags and UID/GID mappings. +- `CAP_NET_ADMIN` and sometimes `CAP_SYS_ADMIN` are required. +- Non-HTTP TCP protocols are redirected but the proxy only understands HTTP and TLS-style traffic. + +## landjail backend + +`landjail/` uses Linux Landlock network restrictions. + +Differences from nsjail: + +- It does not set up transparent iptables routing. +- It sets `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` for the child. +- It clears `NO_PROXY` and `no_proxy` so clients do not bypass boundary. +- It configures CA-related environment variables for common clients. +- It restricts TCP connect to the proxy port. + +When changing landjail: + +- Check kernel and Landlock version assumptions. +- Preserve proxy env injection unless a task explicitly changes the model. +- Test that denied direct connections remain blocked. +- Remember that behavior depends on clients honoring proxy environment variables. + +## Privilege model + +`privilege/` handles Linux privilege escalation for the default nsjail backend. + +Behavior: + +- If needed, boundary re-execs through `sudo` and `setpriv`. +- It keeps the original user's UID/GID where possible. +- It adds ambient and inheritable capabilities required for network namespace and iptables work. +- Non-Linux builds use stubs. + +When changing privilege code: + +- Ask for review before implementation. +- Test both already-privileged and needs-escalation paths where possible. +- Preserve environment variables needed by child processes and the target command. +- Be cautious with PATH handling and sudo behavior. + +## Testing + +Normal validation: + +```sh +make unit-test +make build +``` + +Formatting and linting: + +```sh +make fmt +make fmt-check +make lint +``` + +E2E validation: + +```sh +make e2e-test +``` + +Important test facts: + +- `make unit-test` runs `go test -v -race $(go list ./... | grep -v e2e_tests)`. +- `make e2e-test` runs `sudo $(which go) test -v -race ./e2e_tests -count=1`. +- `make e2e-test` targets only the root `e2e_tests` package, not all subpackages. +- `make test-coverage` runs `go test -v -race -coverprofile=coverage.out ./...`, so it may include e2e packages. +- The Makefile currently does not define a `test` target. Do not use `make test` unless the Makefile changes. + +Testing guidance by area: + +- Rules changes: use parser and matcher tests in `rulesengine/`. +- Proxy changes: prefer `proxy/` unit tests with `httptest` and the proxy test framework. +- Config changes: use `config/*_test.go` and explicit environment slices. +- Audit changes: use `audit/*_test.go`, especially socket auditor behavior. +- nsjail and landjail changes: add focused unit tests where possible, then run e2e only on a suitable Linux sudo host. + +Avoid adding new sleeps in tests. Prefer readiness checks, channels, contexts, test servers, and explicit process state checks. Existing tests contain sleeps, but that should not become the default pattern for new code. + +## CI and releases + +CI lives in `.github/workflows/ci.yml`. + +Current CI behavior: + +- Uses Go 1.25. +- Runs `make deps`. +- Runs `make fmt-check` and `make lint` in the lint job. +- Installs `golangci-lint` before linting. +- Bind-mounts `/run/systemd/resolve/resolv.conf` over `/etc/resolv.conf` before tests on Linux. +- Runs `make unit-test`. +- Runs `make e2e-test`. +- Runs `make build`. + +Build and release workflows: + +- `make build-all` builds Linux amd64 and Linux arm64 binaries. +- Build and release workflow files include Darwin artifact upload paths even though `make build-all` currently creates Linux binaries only. +- Release archives can be created from local `build/` output or downloaded workflow artifacts. + +When changing CI or releases: + +- Confirm Makefile targets exist before referencing them. +- Keep README, RELEASES, Makefile help, and workflows aligned. +- Avoid changing binary names or archive names without considering `install.sh`. +- Check whether artifacts are actually produced before uploading them. + +## Troubleshooting + +### `make test` fails with no rule + +Use `make unit-test` for regular tests. The current Makefile does not define `test`. + +### E2E tests fail with DNS issues + +CI bind-mounts `/run/systemd/resolve/resolv.conf` over `/etc/resolv.conf` so namespace tests can reach upstream DNS instead of the host stub resolver. Local environments may need similar attention. + +### E2E tests leave host networking residue + +Inspect iptables and veth state. Cleanup should remove rules that setup added. Be careful before deleting unrelated host rules. + +### Boundary cannot escalate privileges + +Check that `sudo` and `setpriv` exist and that the current user can use sudo. The default nsjail backend needs capabilities for network setup. + +### Port conflicts + +Default proxy port is `8080`. Default pprof port is `6060`. Use CLI flags or environment variables when running multiple instances. + +### HTTPS clients reject certificates + +Check the CA path in the user config directory and the environment variables injected into the child process. Different clients use different CA variables. + +### Rules do not match as expected + +Check exact vs wildcard domain semantics first. `domain=github.com` and `domain=*.github.com` are different rules. + +## Agent failure catalog + +### Symptom: agent runs `make test` + +Cause: generic Go habit or stale README/help references. + +Fix: inspect the Makefile and run `make unit-test` for normal validation. Use e2e only when appropriate. + +### Symptom: agent runs e2e tests in an unsuitable environment + +Cause: treating e2e tests like normal unit tests. + +Fix: stop and verify Linux, sudo, iptables, namespace support, required tools, and cleanup expectations. + +### Symptom: proxy tests miss CONNECT or transparent TLS paths + +Cause: testing only one request path. + +Fix: add coverage for the path affected by the code change. TLS, HTTP, and CONNECT can differ. + +### Symptom: allow-rule change breaks subdomain behavior + +Cause: confusing exact domain and wildcard domain matching. + +Fix: update tests for base domain, subdomain, and unrelated domain cases. + +### Symptom: audit socket changes block request handling + +Cause: doing synchronous socket work in the request path. + +Fix: keep queueing and batching behavior. Preserve drop and retry tests. + +### Symptom: workflow uploads artifacts that were never built + +Cause: workflow artifact paths drift from `make build-all` outputs. + +Fix: align Makefile, workflow uploads, RELEASES, and install script expectations. + +## Review checklist + +Before opening a PR: + +- [ ] The change is narrow and avoids unrelated cleanup. +- [ ] `go fmt` or `make fmt` was run for Go changes. +- [ ] Focused tests were run for the changed area. +- [ ] `make unit-test` was run unless the change is docs-only and the user agreed to skip it. +- [ ] E2E tests were only run on a suitable Linux sudo host. +- [ ] README, Makefile, workflows, and release docs are aligned when commands or binaries change. +- [ ] Privilege, TLS, iptables, and rule grammar changes received explicit review. diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..5c4d603 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,58 @@ +# Boundary agent instructions + +Boundary is a Linux network isolation tool for monitoring and restricting HTTP and HTTPS requests from child processes. It is security-sensitive code that can mutate host networking during e2e tests. + +Start here, then read the relevant sections in [.claude/docs/BOUNDARY_AGENT_GUIDE.md](.claude/docs/BOUNDARY_AGENT_GUIDE.md). + +## Non-negotiable rules + +- Do not run host-mutating e2e tests casually. They require Linux, sudo, iptables, network namespaces, and cleanup discipline. +- Do not assume `make test` exists. Use `make unit-test` for normal validation and `make e2e-test` only when a Linux sudo environment is appropriate. +- Do not skip `go fmt` for Go changes. +- Keep changes narrow. Avoid unrelated cleanup in security, networking, privilege, TLS, or audit code. +- Preserve Linux build tags in platform-specific files. +- Ask before changing privilege escalation, iptables rules, certificate trust behavior, release workflow semantics, or the allow-rule grammar. + +## Fast commands + +| Task | Command | Notes | +|------|---------|-------| +| Dependencies | `make deps` | Downloads and verifies Go modules | +| Build | `make build` | Builds `./boundary` for the current platform | +| Build all | `make build-all` | Builds Linux amd64 and arm64 binaries | +| Unit tests | `make unit-test` | Race-enabled tests excluding e2e packages | +| E2E tests | `make e2e-test` | Linux only, needs sudo, mutates host networking | +| Coverage | `make test-coverage` | Runs `go test ./...`; may include e2e packages | +| Format | `make fmt` | Runs `go fmt ./...` | +| Format check | `make fmt-check` | Uses `gofmt -l .` | +| Lint | `make lint` | Requires `golangci-lint` | +| Clean | `make clean` | Removes build and coverage artifacts | + +## Read before editing + +- Repository map and architecture: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map](.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map) +- Runtime flow: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#runtime-flow](.claude/docs/BOUNDARY_AGENT_GUIDE.md#runtime-flow) +- CLI and config: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#cli-and-config](.claude/docs/BOUNDARY_AGENT_GUIDE.md#cli-and-config) +- Rules engine: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#rules-engine](.claude/docs/BOUNDARY_AGENT_GUIDE.md#rules-engine) +- Proxy behavior: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#proxy](.claude/docs/BOUNDARY_AGENT_GUIDE.md#proxy) +- Audit logs: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#audit](.claude/docs/BOUNDARY_AGENT_GUIDE.md#audit) +- TLS certificates: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#tls](.claude/docs/BOUNDARY_AGENT_GUIDE.md#tls) +- nsjail backend: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#nsjail-backend](.claude/docs/BOUNDARY_AGENT_GUIDE.md#nsjail-backend) +- landjail backend: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#landjail-backend](.claude/docs/BOUNDARY_AGENT_GUIDE.md#landjail-backend) +- Testing: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#testing](.claude/docs/BOUNDARY_AGENT_GUIDE.md#testing) +- CI and releases: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#ci-and-releases](.claude/docs/BOUNDARY_AGENT_GUIDE.md#ci-and-releases) +- Troubleshooting: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#troubleshooting](.claude/docs/BOUNDARY_AGENT_GUIDE.md#troubleshooting) + +## High-risk areas + +- `e2e_tests/`: read [e2e_tests/AGENTS.md](e2e_tests/AGENTS.md) first. +- `nsjail_manager/`: Linux namespaces, veth, iptables, dummy DNS, privilege-sensitive cleanup. +- `landjail/`: Landlock restrictions, proxy environment injection, and `NO_PROXY` clearing. +- `proxy/`: transparent proxying, explicit CONNECT, TLS MITM, audit sequencing, session-correlation headers. +- `rulesengine/`: exact and wildcard domain semantics. Grammar changes need broad test coverage. +- `tls/`: local CA lifecycle, generated certificates, ownership, and client trust behavior. +- `.github/workflows/`: release and build workflow changes can affect shipped binaries. + +## Compatibility + +`CLAUDE.md` should mirror this file for Claude-style agent runtimes. `.agents/docs` points to `.claude/docs` for agent runtimes that look under `.agents`. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/e2e_tests/AGENTS.md b/e2e_tests/AGENTS.md new file mode 100644 index 0000000..c4f30da --- /dev/null +++ b/e2e_tests/AGENTS.md @@ -0,0 +1,69 @@ +# Boundary e2e test guidance + +E2E tests in this directory are not normal unit tests. They can mutate host networking and require a suitable Linux sudo environment. + +Read this file before changing or running e2e tests. + +## Requirements + +Expected tools and host features include: + +- Linux +- sudo +- Go +- iptables +- ip +- nsenter +- curl +- dig +- nc +- Linux network namespaces +- Landlock support for landjail tests + +## Safety rules + +- Do not run e2e tests casually in a shared or fragile environment. +- Prefer focused package or test-name runs when debugging. +- Expect tests to create boundary binaries under temporary directories. +- Expect tests to create or inspect iptables rules, veth interfaces, and network namespaces. +- Check cleanup when a test fails or is interrupted. +- Do not delete unrelated host iptables rules during cleanup or debugging. + +## Commands + +The Makefile target is: + +```sh +make e2e-test +``` + +It currently runs: + +```sh +sudo $(which go) test -v -race ./e2e_tests -count=1 +``` + +That target runs the root `e2e_tests` package only. It does not run every e2e subpackage. If you need subpackage coverage, choose the package deliberately and document what you ran. + +Examples: + +```sh +sudo $(which go) test -v -race ./e2e_tests/nsjail -count=1 +sudo $(which go) test -v -race ./e2e_tests/landjail -count=1 +``` + +## Common pitfalls + +- DNS inside namespaces can fail if the host uses a stub resolver at `127.0.0.53`. +- iptables cleanup must remove exactly the rules added by setup. +- Port conflicts can occur when another boundary or proxy process is running. +- Existing sleeps in e2e helpers are not a pattern to copy. Prefer readiness checks when adding new tests. +- Some tests depend on external network behavior. Keep assertions focused and diagnostics clear. + +## When editing tests + +- Add targeted assertions for the behavior under test. +- Use unique ports, names, or temporary directories when tests can run concurrently. +- Preserve cleanup with `t.Cleanup` where possible. +- Capture enough diagnostics to debug host networking failures. +- Keep unit-level logic in package tests outside e2e when possible. From 300f45185ef84e4b44277276307e121cc3307dc0 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:11:56 +0000 Subject: [PATCH 2/8] docs: explain boundary architecture --- .claude/docs/BOUNDARY_AGENT_GUIDE.md | 3 +- AGENTS.md | 5 +- ARCHITECTURE.md | 293 ++++++++++++++++++++++++++- 3 files changed, 295 insertions(+), 6 deletions(-) diff --git a/.claude/docs/BOUNDARY_AGENT_GUIDE.md b/.claude/docs/BOUNDARY_AGENT_GUIDE.md index 9f3a6b6..c4befff 100644 --- a/.claude/docs/BOUNDARY_AGENT_GUIDE.md +++ b/.claude/docs/BOUNDARY_AGENT_GUIDE.md @@ -1,6 +1,6 @@ # Boundary agent guide -This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. +This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. For a human-facing system overview, read [`ARCHITECTURE.md`](../../ARCHITECTURE.md). ## Repository map @@ -22,6 +22,7 @@ This guide gives autonomous agents the context needed to change `github.com/code | `log/` | slog setup to stderr or files. | | `e2e_tests/` | Linux sudo tests that can mutate host networking. | | `.github/workflows/` | CI, build, and release workflows. | +| `ARCHITECTURE.md` | Human-facing overview of how Boundary works. | ## Architecture overview diff --git a/AGENTS.md b/AGENTS.md index 5c4d603..c8c63fa 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,7 +2,7 @@ Boundary is a Linux network isolation tool for monitoring and restricting HTTP and HTTPS requests from child processes. It is security-sensitive code that can mutate host networking during e2e tests. -Start here, then read the relevant sections in [.claude/docs/BOUNDARY_AGENT_GUIDE.md](.claude/docs/BOUNDARY_AGENT_GUIDE.md). +Start here, read [ARCHITECTURE.md](ARCHITECTURE.md) for the human system overview, then read the relevant agent workflow sections in [.claude/docs/BOUNDARY_AGENT_GUIDE.md](.claude/docs/BOUNDARY_AGENT_GUIDE.md). ## Non-negotiable rules @@ -30,7 +30,8 @@ Start here, then read the relevant sections in [.claude/docs/BOUNDARY_AGENT_GUID ## Read before editing -- Repository map and architecture: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map](.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map) +- Human architecture overview: [ARCHITECTURE.md](ARCHITECTURE.md) +- Repository map and agent architecture notes: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map](.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map) - Runtime flow: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#runtime-flow](.claude/docs/BOUNDARY_AGENT_GUIDE.md#runtime-flow) - CLI and config: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#cli-and-config](.claude/docs/BOUNDARY_AGENT_GUIDE.md#cli-and-config) - Rules engine: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#rules-engine](.claude/docs/BOUNDARY_AGENT_GUIDE.md#rules-engine) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 48dc6d6..9b2c18b 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,9 +1,296 @@ -# Boundary Architecture +# Boundary architecture + +Boundary is a Linux network isolation tool that runs a child process with restricted network access. It intercepts HTTP and HTTPS traffic, evaluates each request against allow rules, and records an audit trail of what was allowed or denied. + +The practical goal is simple: run an agent or command with a default-deny network policy while still letting approved HTTP and HTTPS requests work normally. + +## High-level model + +Boundary has three moving parts: + +1. **CLI and configuration** parse rules, logging options, jail options, and the target command. +2. **Jail backend** starts the target command in a restricted environment. +3. **Proxy and policy engine** inspect HTTP and HTTPS requests, allow or block them, and emit audit logs. + +```text +user shell + | + | boundary --allow "domain=github.com" [separator] command args... + v +boundary parent process + | + | parses config, creates policy engine, starts proxy, starts jail + v +restricted child process + | + | HTTP and HTTPS traffic + v +boundary proxy + | + | evaluates method, host, path + +--> allowed: forward to upstream server + +--> denied: return HTTP 403 and audit the denial +``` + +## Repository map + +| Path | Responsibility | +|------|----------------| +| `cmd/boundary/main.go` | Binary entrypoint. Builds and runs the CLI command. | +| `cli/` | Command-line interface, flags, environment variables, YAML config loading, and privilege setup. | +| `config/` | Runtime configuration, user information, and session-correlation settings. | +| `run/` | Platform dispatch. Linux runs a jail backend. Non-Linux returns an unsupported-platform error. | +| `rulesengine/` | Allow-rule parsing and matching. | +| `proxy/` | HTTP and HTTPS proxy, transparent TLS detection, CONNECT support, forwarding, blocking, auditing, and session-correlation header injection. | +| `audit/` | Structured stderr audit logging and optional Coder workspace-agent socket forwarding. | +| `tls/` | Local CA management and per-host certificate generation for HTTPS interception. | +| `nsjail_manager/` | Default jail backend using Linux network namespaces, veth pairs, iptables, and dummy DNS. | +| `landjail/` | Alternative jail backend using Landlock restrictions and proxy environment variables. | +| `privilege/` | Linux privilege escalation through `sudo` and `setpriv` for the default backend. | +| `dnsdummy/` | DNS server used by the namespace backend to prevent DNS exfiltration. | +| `e2e_tests/` | Linux integration tests that require sudo and can mutate host networking. | + +## Startup flow + +The startup path is: + +```text +cmd/boundary/main.go + -> cli.NewCommand + -> config.NewAppConfigFromCliConfig + -> privilege.EnsurePrivileges, for nsjail only + -> log.SetupLogging + -> run.Run + -> nsjail_manager.Run or landjail.Run +``` + +The CLI builds a `config.AppConfig` from flags, environment variables, optional YAML, and the target command. Then `run.Run` assigns a new session UUID and dispatches to the requested jail backend. + +The default jail type is `nsjail`. That backend needs Linux network privileges, so the CLI calls `privilege.EnsurePrivileges()` before entering the runtime. If the current process does not have the required capabilities, Boundary re-execs itself through `sudo` and `setpriv` with the minimal capabilities it needs for networking setup. + +The `landjail` backend does not use the same privilege escalation path. + +## Parent and child process model + +Both jail backends use a parent and child process model. The selected backend checks the `CHILD=true` environment variable to decide which role the current process should run. + +### Parent process + +The parent process owns setup and cleanup: + +1. Parse allow rules. +2. Create the rule engine. +3. Set up audit logging. +4. Create or load the local CA. +5. Start the HTTP proxy. +6. Start the child process. +7. Wait for the child process to exit or for a termination signal. +8. Stop the proxy. +9. Clean up backend-specific resources. + +### Child process + +The child process runs the target command inside the restricted environment. Backend-specific setup happens before the target command starts. + +For `nsjail`, the child configures namespace networking and DNS behavior before running the target. For `landjail`, the child applies Landlock network restrictions before running the target. + +## Policy model + +Boundary uses a default-deny policy. Requests are allowed only when at least one allow rule matches. + +Allow rules are strings made of key-value pairs: + +```text +method=GET,HEAD domain=github.com path=/api/* +``` + +Supported keys are: + +- `method`: one or more HTTP methods, comma-separated. `*` matches every method. +- `domain`: an exact host or wildcard host pattern. +- `path`: one or more path patterns, comma-separated. + +Important matching rules: + +- `domain=github.com` matches only `github.com`. +- `domain=github.com` does not match `api.github.com`. +- `domain=*.github.com` matches subdomains such as `api.github.com`. +- `domain=*.github.com` does not match `github.com`. +- To allow a base domain and its subdomains, configure both patterns. +- Path wildcards are segment-based. A wildcard must be a whole path segment. + +The engine returns both the allow or deny decision and the matching rule, if one matched. Audit logs include the matched rule for allowed requests. + +## Proxy model + +The proxy is the enforcement point for HTTP and HTTPS traffic. + +It supports two styles of traffic: + +1. **Transparent traffic**, where the target process does not know about the proxy. The `nsjail` backend redirects TCP traffic to Boundary with iptables. +2. **Explicit proxy traffic**, where the target process uses `HTTP_PROXY` and `HTTPS_PROXY`. The `landjail` backend uses this model. + +### HTTP requests + +For plain HTTP, the proxy reads the request, reconstructs the full URL when needed, evaluates the method, host, and path, then either forwards the request or returns a 403 response. + +### HTTPS requests + +For HTTPS, Boundary acts as a local TLS endpoint so it can inspect the HTTP request inside the encrypted stream. It uses a local CA and generates per-host certificates on demand. + +The target process must trust Boundary's CA. Boundary injects common CA environment variables into the child process so tools such as curl, git, Python requests, and Node can trust the generated certificates. + +### CONNECT requests + +When a client uses Boundary as an explicit HTTP proxy for HTTPS, it sends a CONNECT request. Boundary accepts the CONNECT tunnel, performs TLS with the client, reads HTTP requests from inside the tunnel, and evaluates each request independently. + +### Forwarding and blocking + +For allowed requests, the proxy creates a new upstream request, copies appropriate headers, optionally injects session-correlation headers, and writes the upstream response back to the client. + +For denied requests, the proxy returns HTTP 403 with a short message and example allow rules. + +Every request is audited before the allow or deny handling completes. + +## nsjail backend + +`nsjail` is the default backend. It provides transparent network interception with Linux networking primitives. + +The backend creates a point-to-point network between the host and child namespace: + +```text +host namespace child network namespace + +boundary proxy :8080 target command + ^ | + | | TCP traffic +iptables REDIRECT v + | veth jail side +veth host side +``` + +Key details: + +- The host side of the veth pair uses `192.168.100.1/24`. +- The child side uses `192.168.100.2/24`. +- The fixed subnet is `192.168.100.0/24`. +- iptables NAT and REDIRECT rules send TCP traffic from the child namespace to the Boundary proxy. +- Non-TCP forwarding rules allow return traffic for non-TCP flows. +- A dummy DNS server can run inside the namespace to prevent DNS exfiltration. +- `--use-real-dns` intentionally disables the dummy DNS behavior. +- `--no-user-namespace` disables user namespace creation for restricted environments. + +The parent process configures host-side networking before the child runs. Once the child process exists, the parent moves the jail-side veth into the child's network namespace. The child then configures its IP address, loopback, and default route. + +Cleanup removes the iptables rules and veth interface created during setup. + +## landjail backend + +`landjail` is an alternative backend based on Linux Landlock network restrictions. + +Unlike `nsjail`, it does not rely on transparent iptables redirection. Instead, it configures the child process to use Boundary as an explicit proxy: + +- `HTTP_PROXY` +- `HTTPS_PROXY` +- `http_proxy` +- `https_proxy` + +It also clears `NO_PROXY` and `no_proxy` so the target command cannot bypass Boundary through proxy bypass lists. + +Landlock restricts the child so it can connect only to the Boundary proxy port. This means the backend depends on clients honoring proxy environment variables. A client that ignores those variables will generally fail to connect rather than bypass Boundary. + +## TLS and certificate trust + +Boundary uses TLS interception for HTTPS so it can evaluate host, path, method, and headers in the request. + +The TLS manager: + +1. Finds the user's Boundary config directory. +2. Loads an existing local CA if present. +3. Generates a new local CA if needed. +4. Writes the CA certificate for child processes to trust. +5. Generates per-host certificates for incoming TLS connections. + +The jail backends set environment variables for common tools: + +- `SSL_CERT_FILE` +- `SSL_CERT_DIR` +- `CURL_CA_BUNDLE` +- `GIT_SSL_CAINFO` +- `REQUESTS_CA_BUNDLE` +- `NODE_EXTRA_CA_CERTS` + +When Boundary runs through sudo, ownership and paths must still refer to the original user, not root, where possible. + +## Audit logging + +Boundary audits every HTTP and HTTPS request that reaches the proxy. + +An audit record includes: + +- method +- URL +- host +- allowed or denied decision +- matching rule for allowed requests +- per-session sequence number + +Boundary always creates a stderr log auditor. When running inside a compatible Coder workspace, it can also forward audit batches to the workspace agent over a Unix socket. The workspace agent then forwards the logs to coderd for centralized logging. + +`--disable-audit-logs` disables socket forwarding. It does not remove stderr logging. + +## Session correlation + +The proxy package contains support for injecting session-correlation headers into selected outbound requests. This is intended for Coder AI Gateway flows where downstream services need to correlate a Boundary audit event with an upstream request. + +The headers are defined in `config/session_correlation.go`: + +- `X-Coder-Agent-Firewall-Session-Id` +- `X-Coder-Agent-Firewall-Sequence-Number` + +Injection targets use the same rule engine semantics as normal allow rules. When changing this area, verify the end-to-end runtime path from CLI config through the selected jail backend into `proxy.Config`; unit tests for proxy support alone are not enough. + +## Security properties and limitations + +Boundary is designed for HTTP and HTTPS control. The default policy is deny, but the enforcement point is the proxy and the selected jail backend. + +Important limitations: + +- Boundary is Linux-only for runtime enforcement. +- The `nsjail` backend redirects TCP traffic, but the proxy understands HTTP, HTTPS, and CONNECT-style traffic. Arbitrary non-HTTP TCP protocols are not supported as normal allowed traffic. +- DNS behavior is backend-specific. The namespace backend uses dummy DNS by default to reduce DNS exfiltration. `--use-real-dns` changes that intentionally. +- The landjail backend depends on clients using proxy environment variables. +- The fixed namespace subnet can conflict with local networking in unusual environments. +- E2E tests can mutate host networking and require careful cleanup. + +## Development notes + +Useful commands: + +```sh +make build +make unit-test +make fmt +make fmt-check +make lint +``` + +E2E tests require Linux and sudo: + +```sh +make e2e-test +``` + +Read `e2e_tests/AGENTS.md` before running or changing e2e tests. + +## Diagrams and related work + +The original design sketch is preserved here for context: Boundary -# Alternative Architectures -## Anthropic SRT +Anthropic's sandbox runtime is a related architecture worth comparing when thinking about alternative isolation designs: + https://github.com/anthropic-experimental/sandbox-runtime SRT From 121870569618954a2b04b689cd4d72bf663f02ca Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:19:43 +0000 Subject: [PATCH 3/8] docs: consolidate markdown guides --- .claude/docs/BOUNDARY_AGENT_GUIDE.md | 430 +-------------------------- AGENTS.md | 65 +--- ARCHITECTURE.md | 297 +----------------- README.md | 2 +- docs/agent-guide.md | 429 ++++++++++++++++++++++++++ docs/architecture.md | 296 ++++++++++++++++++ docs/e2e-tests.md | 69 +++++ e2e_tests/AGENTS.md | 70 +---- 8 files changed, 813 insertions(+), 845 deletions(-) mode change 100644 => 120000 .claude/docs/BOUNDARY_AGENT_GUIDE.md mode change 100644 => 120000 ARCHITECTURE.md create mode 100644 docs/agent-guide.md create mode 100644 docs/architecture.md create mode 100644 docs/e2e-tests.md mode change 100644 => 120000 e2e_tests/AGENTS.md diff --git a/.claude/docs/BOUNDARY_AGENT_GUIDE.md b/.claude/docs/BOUNDARY_AGENT_GUIDE.md deleted file mode 100644 index c4befff..0000000 --- a/.claude/docs/BOUNDARY_AGENT_GUIDE.md +++ /dev/null @@ -1,429 +0,0 @@ -# Boundary agent guide - -This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. For a human-facing system overview, read [`ARCHITECTURE.md`](../../ARCHITECTURE.md). - -## Repository map - -| Path | Purpose | -|------|---------| -| `cmd/boundary/main.go` | Binary entrypoint. Creates the CLI command and exits with errors. | -| `cli/` | Serpent CLI, flags, environment variables, YAML config loading, privilege gate. | -| `config/` | App config, user info, session correlation config, header names. | -| `run/` | Platform dispatch. Linux runs a jail backend, non-Linux returns unsupported. | -| `proxy/` | HTTP and HTTPS filtering proxy, CONNECT support, TLS detection, audit, session correlation. | -| `rulesengine/` | Allow-rule parser and matcher. Default-deny policy. | -| `audit/` | Log auditor, socket auditor, multi-auditor, sequence counter. | -| `tls/` | Local CA creation/loading and per-host certificate generation. | -| `nsjail_manager/` | Default Linux namespace backend. Parent and child process orchestration. | -| `nsjail_manager/nsjail/` | Low-level veth, iptables, dummy DNS, env, and command runner code. | -| `landjail/` | Landlock backend using proxy env vars rather than transparent iptables routing. | -| `privilege/` | Linux privilege escalation through `sudo` and `setpriv`; non-Linux stubs. | -| `dnsdummy/` | Dummy DNS server used to prevent DNS exfiltration in namespace mode. | -| `log/` | slog setup to stderr or files. | -| `e2e_tests/` | Linux sudo tests that can mutate host networking. | -| `.github/workflows/` | CI, build, and release workflows. | -| `ARCHITECTURE.md` | Human-facing overview of how Boundary works. | - -## Architecture overview - -Boundary runs a target command in a restricted environment and sends its HTTP and HTTPS traffic through a local filtering proxy. Requests are evaluated against allow rules. Anything that does not match an allow rule is denied. - -Core concepts: - -- Default deny: no rule means no outbound HTTP or HTTPS request is allowed. -- Parent process: sets up proxying, audit, TLS, and jail infrastructure. -- Child process: runs inside the selected jail backend and executes the target command. -- Proxy: parses requests, evaluates allow rules, audits decisions, forwards allowed traffic, and blocks denied traffic. -- Auditor: logs every request decision to stderr and optionally to the Coder workspace-agent socket. -- TLS manager: creates a local CA and per-host certificates so HTTPS can be inspected. - -Boundary has two jail backends: - -- `nsjail`: default. Uses Linux network namespaces, veth pairs, iptables NAT and REDIRECT rules, and optional user namespaces. -- `landjail`: uses Landlock network restrictions. It relies on proxy environment variables instead of transparent iptables redirection. - -## Runtime flow - -High-level flow: - -1. `cmd/boundary/main.go` calls `cli.NewCommand(version)`. -2. `cli/cli.go` parses flags, environment variables, and optional YAML config into `config.CliConfig`. -3. `config.NewAppConfigFromCliConfig` builds `config.AppConfig` and validates session-correlation config. -4. If jail type is `nsjail`, `privilege.EnsurePrivileges()` re-execs through `sudo` and `setpriv` when needed. -5. `run.Run` generates a boundary session UUID and dispatches to `nsjail_manager.Run` or `landjail.Run`. -6. The selected backend decides whether the current process is a parent or child by checking `CHILD=true`. -7. The parent parses allow rules, builds the rule engine, sets up auditors, creates TLS config, starts the proxy, then starts the child process. -8. The child applies jail-specific network setup and runs the target command. -9. The proxy evaluates each HTTP or HTTPS request and audits the result. -10. The parent stops the proxy and cleans up host resources when the target command exits or a signal is received. - -## CLI and config - -The CLI is built with `github.com/coder/serpent` in `cli/cli.go`. - -Important config types: - -- `config.CliConfig`: serpent values for flags, environment variables, and YAML. -- `config.AppConfig`: runtime config passed into the jail backend and proxy setup. -- `config.SessionCorrelationConfig`: controls session-correlation header injection. -- `config.UserInfo`: resolves the effective user, including sudo scenarios. - -Important CLI behavior: - -- `--allow` is repeatable and CLI-only. -- YAML `allowlist` is merged with CLI `--allow` rules. -- `--jail-type` defaults to `nsjail`. -- `--use-real-dns` intentionally permits DNS exfiltration. Do not enable it by accident. -- `--disable-audit-logs` disables workspace-agent socket forwarding. It does not remove stderr logging. -- `--enable-session-correlation` requires configured inject targets or a valid fallback from `CODER_AGENT_URL`. -- `--log-proxy-socket-path` defaults to the Coder workspace-agent boundary log proxy socket path. - -When changing CLI flags: - -- Update README usage if behavior changes. -- Add or update config tests if parsing or validation changes. -- Check environment variable names. Some are shared with the Coder workspace agent. -- Preserve backwards compatibility unless the task explicitly allows breaking it. - -## Rules engine - -`rulesengine/` parses and evaluates allow rules. - -Rule grammar uses key-value tokens: - -```text -method=GET,POST domain=github.com path=/api/* -``` - -Supported keys: - -- `method`: one or more HTTP token values, comma-separated. `*` matches all methods. -- `domain`: hostname pattern. `*` can be a full label. -- `path`: one or more path patterns, comma-separated. - -Important matching semantics: - -- No matching allow rule means denied. -- `domain=github.com` matches only `github.com`. -- `domain=github.com` does not match `api.github.com`. -- `domain=*.github.com` matches subdomains like `api.github.com`. -- `domain=*.github.com` does not match the base domain `github.com`. -- To allow both a base domain and its subdomains, use two rules. -- Path wildcards are segment-based. A wildcard must be the entire segment. -- A path pattern ending in `*` can match additional path segments. - -When changing rule parsing or matching: - -- Update parser tests in `rulesengine/`. -- Update matcher tests in `rulesengine/`. -- Update README examples if user-visible behavior changes. -- Be careful with percent-encoded paths. Proxy forwarding preserves `RawPath` for cases like scoped npm package names. - -## Proxy - -`proxy/` contains the filtering proxy. It handles both transparent proxy traffic and explicit HTTP proxy traffic. - -Main files: - -- `proxy/proxy.go`: server lifecycle, TLS detection, HTTP and HTTPS processing, forwarding, block responses. -- `proxy/connect.go`: HTTP CONNECT tunnel support. -- `proxy/*_test.go`: proxy tests and framework. - -Request handling paths: - -1. Transparent HTTP: connection is not TLS, request is read directly, then evaluated. -2. Transparent HTTPS: first byte looks like TLS, boundary terminates TLS with a generated certificate, reads the HTTP request, then evaluates it. -3. Explicit HTTP proxy: client sends an absolute URL in the HTTP request. -4. Explicit HTTPS proxy: client sends CONNECT, boundary establishes a TLS tunnel, then reads HTTP requests inside the tunnel. - -Important proxy behavior: - -- Every request is audited before allow or deny handling completes. -- Audit sequence numbers are per proxy server instance and come from `audit.SequenceCounter`. -- Denied requests get a 403 response with suggested allow rules. -- Allowed requests are forwarded with a new upstream request. -- For GET and HEAD, forwarded request bodies are set to nil. -- Upstream responses are read fully so `Content-Length` can be set explicitly. -- Responses are normalized to HTTP/1.1 before writing back to the downstream client. -- Optional session-correlation headers are injected only when the request URL matches configured inject targets. - -When changing proxy behavior: - -- Prefer unit tests with `proxy/proxy_framework_test.go` and `httptest`. -- Avoid live network tests unless the behavior truly requires it. -- Test both allow and deny paths. -- Test both transparent and CONNECT paths when TLS behavior changes. -- Preserve audit behavior for both allowed and denied requests. - -## Audit - -`audit/` provides request auditing. - -Key types: - -- `audit.Request`: request decision payload. -- `audit.Auditor`: interface implemented by all auditors. -- `audit.LogAuditor`: writes structured logs through slog. -- `audit.SocketAuditor`: batches and forwards logs to the Coder workspace-agent socket. -- `audit.MultiAuditor`: fans out to multiple auditors. -- `audit.SequenceCounter`: atomic counter for per-request sequence numbers. - -Important behavior: - -- `SetupAuditor` always includes the log auditor. -- Socket forwarding is skipped when audit logs are disabled, the socket path is empty, or the socket does not exist. -- Socket auditor queues logs, batches them, retries connection failures, and reports drops. -- Allowed audit entries include the matching rule. -- Denied audit entries do not include a rule. -- Sequence numbers start at zero. - -When changing audit behavior: - -- Check `audit/socket_auditor_test.go` for batching, retry, drop, shutdown, and session ID expectations. -- Preserve the Coder boundary log proxy codec contract. -- Avoid blocking request handling on slow socket forwarding. - -## TLS - -`tls/` generates and loads certificates used for TLS interception. - -Key behavior: - -- A local CA is stored in the user's boundary config directory. -- Existing CA files are reused when possible. -- Per-host server certificates are generated on demand. -- The CA path is injected into child process environments so tools can trust boundary's generated certificates. - -When changing TLS behavior: - -- Preserve file ownership for the original user when running through sudo. -- Be careful with config directory paths from `config.UserInfo`. -- Consider the impact on curl, git, Python requests, and Node clients. -- Avoid broad certificate trust changes without explicit review. - -## nsjail backend - -`nsjail_manager/` is the default backend. - -Parent flow: - -1. Parse allow rules. -2. Build rule engine. -3. Set up audit. -4. Set up TLS and write CA certificate. -5. Create `nsjail.LinuxJail`. -6. Start the proxy. -7. Launch a child boundary process with `CHILD=true`. -8. Configure host-to-namespace communication after child PID exists. -9. Wait for child exit or signal. -10. Stop proxy and clean up iptables and veth state. - -Child flow: - -1. Wait for the jail-side veth interface. -2. Configure namespace networking. -3. Start dummy DNS and redirect DNS unless `--use-real-dns` is enabled. -4. Run the target command. - -Low-level networking behavior: - -- Host-side address: `192.168.100.1/24`. -- Jail-side address: `192.168.100.2/24`. -- Fixed subnet: `192.168.100.0/24`. -- TCP traffic from the jail is redirected to the local HTTP proxy with iptables. -- Non-TCP forwarding rules allow return traffic for non-TCP flows. -- Dummy DNS prevents DNS exfiltration by redirecting DNS to local dummy responses. - -High-risk details: - -- Interface names are constrained by Linux's 15-character interface name limit. -- iptables cleanup must mirror setup rules. -- `--no-user-namespace` changes clone flags and UID/GID mappings. -- `CAP_NET_ADMIN` and sometimes `CAP_SYS_ADMIN` are required. -- Non-HTTP TCP protocols are redirected but the proxy only understands HTTP and TLS-style traffic. - -## landjail backend - -`landjail/` uses Linux Landlock network restrictions. - -Differences from nsjail: - -- It does not set up transparent iptables routing. -- It sets `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` for the child. -- It clears `NO_PROXY` and `no_proxy` so clients do not bypass boundary. -- It configures CA-related environment variables for common clients. -- It restricts TCP connect to the proxy port. - -When changing landjail: - -- Check kernel and Landlock version assumptions. -- Preserve proxy env injection unless a task explicitly changes the model. -- Test that denied direct connections remain blocked. -- Remember that behavior depends on clients honoring proxy environment variables. - -## Privilege model - -`privilege/` handles Linux privilege escalation for the default nsjail backend. - -Behavior: - -- If needed, boundary re-execs through `sudo` and `setpriv`. -- It keeps the original user's UID/GID where possible. -- It adds ambient and inheritable capabilities required for network namespace and iptables work. -- Non-Linux builds use stubs. - -When changing privilege code: - -- Ask for review before implementation. -- Test both already-privileged and needs-escalation paths where possible. -- Preserve environment variables needed by child processes and the target command. -- Be cautious with PATH handling and sudo behavior. - -## Testing - -Normal validation: - -```sh -make unit-test -make build -``` - -Formatting and linting: - -```sh -make fmt -make fmt-check -make lint -``` - -E2E validation: - -```sh -make e2e-test -``` - -Important test facts: - -- `make unit-test` runs `go test -v -race $(go list ./... | grep -v e2e_tests)`. -- `make e2e-test` runs `sudo $(which go) test -v -race ./e2e_tests -count=1`. -- `make e2e-test` targets only the root `e2e_tests` package, not all subpackages. -- `make test-coverage` runs `go test -v -race -coverprofile=coverage.out ./...`, so it may include e2e packages. -- The Makefile currently does not define a `test` target. Do not use `make test` unless the Makefile changes. - -Testing guidance by area: - -- Rules changes: use parser and matcher tests in `rulesengine/`. -- Proxy changes: prefer `proxy/` unit tests with `httptest` and the proxy test framework. -- Config changes: use `config/*_test.go` and explicit environment slices. -- Audit changes: use `audit/*_test.go`, especially socket auditor behavior. -- nsjail and landjail changes: add focused unit tests where possible, then run e2e only on a suitable Linux sudo host. - -Avoid adding new sleeps in tests. Prefer readiness checks, channels, contexts, test servers, and explicit process state checks. Existing tests contain sleeps, but that should not become the default pattern for new code. - -## CI and releases - -CI lives in `.github/workflows/ci.yml`. - -Current CI behavior: - -- Uses Go 1.25. -- Runs `make deps`. -- Runs `make fmt-check` and `make lint` in the lint job. -- Installs `golangci-lint` before linting. -- Bind-mounts `/run/systemd/resolve/resolv.conf` over `/etc/resolv.conf` before tests on Linux. -- Runs `make unit-test`. -- Runs `make e2e-test`. -- Runs `make build`. - -Build and release workflows: - -- `make build-all` builds Linux amd64 and Linux arm64 binaries. -- Build and release workflow files include Darwin artifact upload paths even though `make build-all` currently creates Linux binaries only. -- Release archives can be created from local `build/` output or downloaded workflow artifacts. - -When changing CI or releases: - -- Confirm Makefile targets exist before referencing them. -- Keep README, RELEASES, Makefile help, and workflows aligned. -- Avoid changing binary names or archive names without considering `install.sh`. -- Check whether artifacts are actually produced before uploading them. - -## Troubleshooting - -### `make test` fails with no rule - -Use `make unit-test` for regular tests. The current Makefile does not define `test`. - -### E2E tests fail with DNS issues - -CI bind-mounts `/run/systemd/resolve/resolv.conf` over `/etc/resolv.conf` so namespace tests can reach upstream DNS instead of the host stub resolver. Local environments may need similar attention. - -### E2E tests leave host networking residue - -Inspect iptables and veth state. Cleanup should remove rules that setup added. Be careful before deleting unrelated host rules. - -### Boundary cannot escalate privileges - -Check that `sudo` and `setpriv` exist and that the current user can use sudo. The default nsjail backend needs capabilities for network setup. - -### Port conflicts - -Default proxy port is `8080`. Default pprof port is `6060`. Use CLI flags or environment variables when running multiple instances. - -### HTTPS clients reject certificates - -Check the CA path in the user config directory and the environment variables injected into the child process. Different clients use different CA variables. - -### Rules do not match as expected - -Check exact vs wildcard domain semantics first. `domain=github.com` and `domain=*.github.com` are different rules. - -## Agent failure catalog - -### Symptom: agent runs `make test` - -Cause: generic Go habit or stale README/help references. - -Fix: inspect the Makefile and run `make unit-test` for normal validation. Use e2e only when appropriate. - -### Symptom: agent runs e2e tests in an unsuitable environment - -Cause: treating e2e tests like normal unit tests. - -Fix: stop and verify Linux, sudo, iptables, namespace support, required tools, and cleanup expectations. - -### Symptom: proxy tests miss CONNECT or transparent TLS paths - -Cause: testing only one request path. - -Fix: add coverage for the path affected by the code change. TLS, HTTP, and CONNECT can differ. - -### Symptom: allow-rule change breaks subdomain behavior - -Cause: confusing exact domain and wildcard domain matching. - -Fix: update tests for base domain, subdomain, and unrelated domain cases. - -### Symptom: audit socket changes block request handling - -Cause: doing synchronous socket work in the request path. - -Fix: keep queueing and batching behavior. Preserve drop and retry tests. - -### Symptom: workflow uploads artifacts that were never built - -Cause: workflow artifact paths drift from `make build-all` outputs. - -Fix: align Makefile, workflow uploads, RELEASES, and install script expectations. - -## Review checklist - -Before opening a PR: - -- [ ] The change is narrow and avoids unrelated cleanup. -- [ ] `go fmt` or `make fmt` was run for Go changes. -- [ ] Focused tests were run for the changed area. -- [ ] `make unit-test` was run unless the change is docs-only and the user agreed to skip it. -- [ ] E2E tests were only run on a suitable Linux sudo host. -- [ ] README, Makefile, workflows, and release docs are aligned when commands or binaries change. -- [ ] Privilege, TLS, iptables, and rule grammar changes received explicit review. diff --git a/.claude/docs/BOUNDARY_AGENT_GUIDE.md b/.claude/docs/BOUNDARY_AGENT_GUIDE.md new file mode 120000 index 0000000..36fd01a --- /dev/null +++ b/.claude/docs/BOUNDARY_AGENT_GUIDE.md @@ -0,0 +1 @@ +../../docs/agent-guide.md \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index c8c63fa..aba34d9 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,59 +1,24 @@ # Boundary agent instructions -Boundary is a Linux network isolation tool for monitoring and restricting HTTP and HTTPS requests from child processes. It is security-sensitive code that can mutate host networking during e2e tests. +Boundary is a Linux network isolation tool for monitoring and restricting HTTP and HTTPS requests from child processes. This file is only the root entrypoint for agent runtimes. Keep canonical guidance in `docs/`. -Start here, read [ARCHITECTURE.md](ARCHITECTURE.md) for the human system overview, then read the relevant agent workflow sections in [.claude/docs/BOUNDARY_AGENT_GUIDE.md](.claude/docs/BOUNDARY_AGENT_GUIDE.md). +## Canonical docs + +- Human architecture overview: [docs/architecture.md](docs/architecture.md) +- Agent workflow guide: [docs/agent-guide.md](docs/agent-guide.md) +- E2E test safety guide: [docs/e2e-tests.md](docs/e2e-tests.md) ## Non-negotiable rules -- Do not run host-mutating e2e tests casually. They require Linux, sudo, iptables, network namespaces, and cleanup discipline. -- Do not assume `make test` exists. Use `make unit-test` for normal validation and `make e2e-test` only when a Linux sudo environment is appropriate. -- Do not skip `go fmt` for Go changes. -- Keep changes narrow. Avoid unrelated cleanup in security, networking, privilege, TLS, or audit code. -- Preserve Linux build tags in platform-specific files. +- Read [docs/agent-guide.md](docs/agent-guide.md) before making non-trivial changes. +- Read [docs/e2e-tests.md](docs/e2e-tests.md) before running or changing e2e tests. +- Use `make unit-test` for normal validation. Do not assume `make test` exists. - Ask before changing privilege escalation, iptables rules, certificate trust behavior, release workflow semantics, or the allow-rule grammar. -## Fast commands - -| Task | Command | Notes | -|------|---------|-------| -| Dependencies | `make deps` | Downloads and verifies Go modules | -| Build | `make build` | Builds `./boundary` for the current platform | -| Build all | `make build-all` | Builds Linux amd64 and arm64 binaries | -| Unit tests | `make unit-test` | Race-enabled tests excluding e2e packages | -| E2E tests | `make e2e-test` | Linux only, needs sudo, mutates host networking | -| Coverage | `make test-coverage` | Runs `go test ./...`; may include e2e packages | -| Format | `make fmt` | Runs `go fmt ./...` | -| Format check | `make fmt-check` | Uses `gofmt -l .` | -| Lint | `make lint` | Requires `golangci-lint` | -| Clean | `make clean` | Removes build and coverage artifacts | - -## Read before editing - -- Human architecture overview: [ARCHITECTURE.md](ARCHITECTURE.md) -- Repository map and agent architecture notes: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map](.claude/docs/BOUNDARY_AGENT_GUIDE.md#repository-map) -- Runtime flow: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#runtime-flow](.claude/docs/BOUNDARY_AGENT_GUIDE.md#runtime-flow) -- CLI and config: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#cli-and-config](.claude/docs/BOUNDARY_AGENT_GUIDE.md#cli-and-config) -- Rules engine: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#rules-engine](.claude/docs/BOUNDARY_AGENT_GUIDE.md#rules-engine) -- Proxy behavior: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#proxy](.claude/docs/BOUNDARY_AGENT_GUIDE.md#proxy) -- Audit logs: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#audit](.claude/docs/BOUNDARY_AGENT_GUIDE.md#audit) -- TLS certificates: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#tls](.claude/docs/BOUNDARY_AGENT_GUIDE.md#tls) -- nsjail backend: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#nsjail-backend](.claude/docs/BOUNDARY_AGENT_GUIDE.md#nsjail-backend) -- landjail backend: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#landjail-backend](.claude/docs/BOUNDARY_AGENT_GUIDE.md#landjail-backend) -- Testing: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#testing](.claude/docs/BOUNDARY_AGENT_GUIDE.md#testing) -- CI and releases: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#ci-and-releases](.claude/docs/BOUNDARY_AGENT_GUIDE.md#ci-and-releases) -- Troubleshooting: [.claude/docs/BOUNDARY_AGENT_GUIDE.md#troubleshooting](.claude/docs/BOUNDARY_AGENT_GUIDE.md#troubleshooting) - -## High-risk areas - -- `e2e_tests/`: read [e2e_tests/AGENTS.md](e2e_tests/AGENTS.md) first. -- `nsjail_manager/`: Linux namespaces, veth, iptables, dummy DNS, privilege-sensitive cleanup. -- `landjail/`: Landlock restrictions, proxy environment injection, and `NO_PROXY` clearing. -- `proxy/`: transparent proxying, explicit CONNECT, TLS MITM, audit sequencing, session-correlation headers. -- `rulesengine/`: exact and wildcard domain semantics. Grammar changes need broad test coverage. -- `tls/`: local CA lifecycle, generated certificates, ownership, and client trust behavior. -- `.github/workflows/`: release and build workflow changes can affect shipped binaries. - -## Compatibility +## Compatibility links -`CLAUDE.md` should mirror this file for Claude-style agent runtimes. `.agents/docs` points to `.claude/docs` for agent runtimes that look under `.agents`. +- `CLAUDE.md` points to this file for Claude-style agent runtimes. +- `ARCHITECTURE.md` points to `docs/architecture.md` for existing links. +- `.claude/docs/BOUNDARY_AGENT_GUIDE.md` points to `docs/agent-guide.md`. +- `.agents/docs` points to `.claude/docs` for agent runtimes that look under `.agents`. +- `e2e_tests/AGENTS.md` points to `docs/e2e-tests.md`. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md deleted file mode 100644 index 9b2c18b..0000000 --- a/ARCHITECTURE.md +++ /dev/null @@ -1,296 +0,0 @@ -# Boundary architecture - -Boundary is a Linux network isolation tool that runs a child process with restricted network access. It intercepts HTTP and HTTPS traffic, evaluates each request against allow rules, and records an audit trail of what was allowed or denied. - -The practical goal is simple: run an agent or command with a default-deny network policy while still letting approved HTTP and HTTPS requests work normally. - -## High-level model - -Boundary has three moving parts: - -1. **CLI and configuration** parse rules, logging options, jail options, and the target command. -2. **Jail backend** starts the target command in a restricted environment. -3. **Proxy and policy engine** inspect HTTP and HTTPS requests, allow or block them, and emit audit logs. - -```text -user shell - | - | boundary --allow "domain=github.com" [separator] command args... - v -boundary parent process - | - | parses config, creates policy engine, starts proxy, starts jail - v -restricted child process - | - | HTTP and HTTPS traffic - v -boundary proxy - | - | evaluates method, host, path - +--> allowed: forward to upstream server - +--> denied: return HTTP 403 and audit the denial -``` - -## Repository map - -| Path | Responsibility | -|------|----------------| -| `cmd/boundary/main.go` | Binary entrypoint. Builds and runs the CLI command. | -| `cli/` | Command-line interface, flags, environment variables, YAML config loading, and privilege setup. | -| `config/` | Runtime configuration, user information, and session-correlation settings. | -| `run/` | Platform dispatch. Linux runs a jail backend. Non-Linux returns an unsupported-platform error. | -| `rulesengine/` | Allow-rule parsing and matching. | -| `proxy/` | HTTP and HTTPS proxy, transparent TLS detection, CONNECT support, forwarding, blocking, auditing, and session-correlation header injection. | -| `audit/` | Structured stderr audit logging and optional Coder workspace-agent socket forwarding. | -| `tls/` | Local CA management and per-host certificate generation for HTTPS interception. | -| `nsjail_manager/` | Default jail backend using Linux network namespaces, veth pairs, iptables, and dummy DNS. | -| `landjail/` | Alternative jail backend using Landlock restrictions and proxy environment variables. | -| `privilege/` | Linux privilege escalation through `sudo` and `setpriv` for the default backend. | -| `dnsdummy/` | DNS server used by the namespace backend to prevent DNS exfiltration. | -| `e2e_tests/` | Linux integration tests that require sudo and can mutate host networking. | - -## Startup flow - -The startup path is: - -```text -cmd/boundary/main.go - -> cli.NewCommand - -> config.NewAppConfigFromCliConfig - -> privilege.EnsurePrivileges, for nsjail only - -> log.SetupLogging - -> run.Run - -> nsjail_manager.Run or landjail.Run -``` - -The CLI builds a `config.AppConfig` from flags, environment variables, optional YAML, and the target command. Then `run.Run` assigns a new session UUID and dispatches to the requested jail backend. - -The default jail type is `nsjail`. That backend needs Linux network privileges, so the CLI calls `privilege.EnsurePrivileges()` before entering the runtime. If the current process does not have the required capabilities, Boundary re-execs itself through `sudo` and `setpriv` with the minimal capabilities it needs for networking setup. - -The `landjail` backend does not use the same privilege escalation path. - -## Parent and child process model - -Both jail backends use a parent and child process model. The selected backend checks the `CHILD=true` environment variable to decide which role the current process should run. - -### Parent process - -The parent process owns setup and cleanup: - -1. Parse allow rules. -2. Create the rule engine. -3. Set up audit logging. -4. Create or load the local CA. -5. Start the HTTP proxy. -6. Start the child process. -7. Wait for the child process to exit or for a termination signal. -8. Stop the proxy. -9. Clean up backend-specific resources. - -### Child process - -The child process runs the target command inside the restricted environment. Backend-specific setup happens before the target command starts. - -For `nsjail`, the child configures namespace networking and DNS behavior before running the target. For `landjail`, the child applies Landlock network restrictions before running the target. - -## Policy model - -Boundary uses a default-deny policy. Requests are allowed only when at least one allow rule matches. - -Allow rules are strings made of key-value pairs: - -```text -method=GET,HEAD domain=github.com path=/api/* -``` - -Supported keys are: - -- `method`: one or more HTTP methods, comma-separated. `*` matches every method. -- `domain`: an exact host or wildcard host pattern. -- `path`: one or more path patterns, comma-separated. - -Important matching rules: - -- `domain=github.com` matches only `github.com`. -- `domain=github.com` does not match `api.github.com`. -- `domain=*.github.com` matches subdomains such as `api.github.com`. -- `domain=*.github.com` does not match `github.com`. -- To allow a base domain and its subdomains, configure both patterns. -- Path wildcards are segment-based. A wildcard must be a whole path segment. - -The engine returns both the allow or deny decision and the matching rule, if one matched. Audit logs include the matched rule for allowed requests. - -## Proxy model - -The proxy is the enforcement point for HTTP and HTTPS traffic. - -It supports two styles of traffic: - -1. **Transparent traffic**, where the target process does not know about the proxy. The `nsjail` backend redirects TCP traffic to Boundary with iptables. -2. **Explicit proxy traffic**, where the target process uses `HTTP_PROXY` and `HTTPS_PROXY`. The `landjail` backend uses this model. - -### HTTP requests - -For plain HTTP, the proxy reads the request, reconstructs the full URL when needed, evaluates the method, host, and path, then either forwards the request or returns a 403 response. - -### HTTPS requests - -For HTTPS, Boundary acts as a local TLS endpoint so it can inspect the HTTP request inside the encrypted stream. It uses a local CA and generates per-host certificates on demand. - -The target process must trust Boundary's CA. Boundary injects common CA environment variables into the child process so tools such as curl, git, Python requests, and Node can trust the generated certificates. - -### CONNECT requests - -When a client uses Boundary as an explicit HTTP proxy for HTTPS, it sends a CONNECT request. Boundary accepts the CONNECT tunnel, performs TLS with the client, reads HTTP requests from inside the tunnel, and evaluates each request independently. - -### Forwarding and blocking - -For allowed requests, the proxy creates a new upstream request, copies appropriate headers, optionally injects session-correlation headers, and writes the upstream response back to the client. - -For denied requests, the proxy returns HTTP 403 with a short message and example allow rules. - -Every request is audited before the allow or deny handling completes. - -## nsjail backend - -`nsjail` is the default backend. It provides transparent network interception with Linux networking primitives. - -The backend creates a point-to-point network between the host and child namespace: - -```text -host namespace child network namespace - -boundary proxy :8080 target command - ^ | - | | TCP traffic -iptables REDIRECT v - | veth jail side -veth host side -``` - -Key details: - -- The host side of the veth pair uses `192.168.100.1/24`. -- The child side uses `192.168.100.2/24`. -- The fixed subnet is `192.168.100.0/24`. -- iptables NAT and REDIRECT rules send TCP traffic from the child namespace to the Boundary proxy. -- Non-TCP forwarding rules allow return traffic for non-TCP flows. -- A dummy DNS server can run inside the namespace to prevent DNS exfiltration. -- `--use-real-dns` intentionally disables the dummy DNS behavior. -- `--no-user-namespace` disables user namespace creation for restricted environments. - -The parent process configures host-side networking before the child runs. Once the child process exists, the parent moves the jail-side veth into the child's network namespace. The child then configures its IP address, loopback, and default route. - -Cleanup removes the iptables rules and veth interface created during setup. - -## landjail backend - -`landjail` is an alternative backend based on Linux Landlock network restrictions. - -Unlike `nsjail`, it does not rely on transparent iptables redirection. Instead, it configures the child process to use Boundary as an explicit proxy: - -- `HTTP_PROXY` -- `HTTPS_PROXY` -- `http_proxy` -- `https_proxy` - -It also clears `NO_PROXY` and `no_proxy` so the target command cannot bypass Boundary through proxy bypass lists. - -Landlock restricts the child so it can connect only to the Boundary proxy port. This means the backend depends on clients honoring proxy environment variables. A client that ignores those variables will generally fail to connect rather than bypass Boundary. - -## TLS and certificate trust - -Boundary uses TLS interception for HTTPS so it can evaluate host, path, method, and headers in the request. - -The TLS manager: - -1. Finds the user's Boundary config directory. -2. Loads an existing local CA if present. -3. Generates a new local CA if needed. -4. Writes the CA certificate for child processes to trust. -5. Generates per-host certificates for incoming TLS connections. - -The jail backends set environment variables for common tools: - -- `SSL_CERT_FILE` -- `SSL_CERT_DIR` -- `CURL_CA_BUNDLE` -- `GIT_SSL_CAINFO` -- `REQUESTS_CA_BUNDLE` -- `NODE_EXTRA_CA_CERTS` - -When Boundary runs through sudo, ownership and paths must still refer to the original user, not root, where possible. - -## Audit logging - -Boundary audits every HTTP and HTTPS request that reaches the proxy. - -An audit record includes: - -- method -- URL -- host -- allowed or denied decision -- matching rule for allowed requests -- per-session sequence number - -Boundary always creates a stderr log auditor. When running inside a compatible Coder workspace, it can also forward audit batches to the workspace agent over a Unix socket. The workspace agent then forwards the logs to coderd for centralized logging. - -`--disable-audit-logs` disables socket forwarding. It does not remove stderr logging. - -## Session correlation - -The proxy package contains support for injecting session-correlation headers into selected outbound requests. This is intended for Coder AI Gateway flows where downstream services need to correlate a Boundary audit event with an upstream request. - -The headers are defined in `config/session_correlation.go`: - -- `X-Coder-Agent-Firewall-Session-Id` -- `X-Coder-Agent-Firewall-Sequence-Number` - -Injection targets use the same rule engine semantics as normal allow rules. When changing this area, verify the end-to-end runtime path from CLI config through the selected jail backend into `proxy.Config`; unit tests for proxy support alone are not enough. - -## Security properties and limitations - -Boundary is designed for HTTP and HTTPS control. The default policy is deny, but the enforcement point is the proxy and the selected jail backend. - -Important limitations: - -- Boundary is Linux-only for runtime enforcement. -- The `nsjail` backend redirects TCP traffic, but the proxy understands HTTP, HTTPS, and CONNECT-style traffic. Arbitrary non-HTTP TCP protocols are not supported as normal allowed traffic. -- DNS behavior is backend-specific. The namespace backend uses dummy DNS by default to reduce DNS exfiltration. `--use-real-dns` changes that intentionally. -- The landjail backend depends on clients using proxy environment variables. -- The fixed namespace subnet can conflict with local networking in unusual environments. -- E2E tests can mutate host networking and require careful cleanup. - -## Development notes - -Useful commands: - -```sh -make build -make unit-test -make fmt -make fmt-check -make lint -``` - -E2E tests require Linux and sudo: - -```sh -make e2e-test -``` - -Read `e2e_tests/AGENTS.md` before running or changing e2e tests. - -## Diagrams and related work - -The original design sketch is preserved here for context: - -Boundary - -Anthropic's sandbox runtime is a related architecture worth comparing when thinking about alternative isolation designs: - -https://github.com/anthropic-experimental/sandbox-runtime - -SRT diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 120000 index 0000000..6d39509 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1 @@ +docs/architecture.md \ No newline at end of file diff --git a/README.md b/README.md index 7c3474b..44d0b3c 100644 --- a/README.md +++ b/README.md @@ -184,7 +184,7 @@ make lint # Lint code ## Architecture -For detailed information about how `boundary` works internally, see [ARCHITECTURE.md](ARCHITECTURE.md). +For detailed information about how `boundary` works internally, see [docs/architecture.md](docs/architecture.md). ## License diff --git a/docs/agent-guide.md b/docs/agent-guide.md new file mode 100644 index 0000000..fc31c4f --- /dev/null +++ b/docs/agent-guide.md @@ -0,0 +1,429 @@ +# Boundary agent guide + +This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. For a human-facing system overview, read [architecture.md](architecture.md). + +## Repository map + +| Path | Purpose | +|------|---------| +| `cmd/boundary/main.go` | Binary entrypoint. Creates the CLI command and exits with errors. | +| `cli/` | Serpent CLI, flags, environment variables, YAML config loading, privilege gate. | +| `config/` | App config, user info, session correlation config, header names. | +| `run/` | Platform dispatch. Linux runs a jail backend, non-Linux returns unsupported. | +| `proxy/` | HTTP and HTTPS filtering proxy, CONNECT support, TLS detection, audit, session correlation. | +| `rulesengine/` | Allow-rule parser and matcher. Default-deny policy. | +| `audit/` | Log auditor, socket auditor, multi-auditor, sequence counter. | +| `tls/` | Local CA creation/loading and per-host certificate generation. | +| `nsjail_manager/` | Default Linux namespace backend. Parent and child process orchestration. | +| `nsjail_manager/nsjail/` | Low-level veth, iptables, dummy DNS, env, and command runner code. | +| `landjail/` | Landlock backend using proxy env vars rather than transparent iptables routing. | +| `privilege/` | Linux privilege escalation through `sudo` and `setpriv`; non-Linux stubs. | +| `dnsdummy/` | Dummy DNS server used to prevent DNS exfiltration in namespace mode. | +| `log/` | slog setup to stderr or files. | +| `e2e_tests/` | Linux sudo tests that can mutate host networking. | +| `.github/workflows/` | CI, build, and release workflows. | +| `docs/architecture.md` | Human-facing overview of how Boundary works. | + +## Architecture overview + +Boundary runs a target command in a restricted environment and sends its HTTP and HTTPS traffic through a local filtering proxy. Requests are evaluated against allow rules. Anything that does not match an allow rule is denied. + +Core concepts: + +- Default deny: no rule means no outbound HTTP or HTTPS request is allowed. +- Parent process: sets up proxying, audit, TLS, and jail infrastructure. +- Child process: runs inside the selected jail backend and executes the target command. +- Proxy: parses requests, evaluates allow rules, audits decisions, forwards allowed traffic, and blocks denied traffic. +- Auditor: logs every request decision to stderr and optionally to the Coder workspace-agent socket. +- TLS manager: creates a local CA and per-host certificates so HTTPS can be inspected. + +Boundary has two jail backends: + +- `nsjail`: default. Uses Linux network namespaces, veth pairs, iptables NAT and REDIRECT rules, and optional user namespaces. +- `landjail`: uses Landlock network restrictions. It relies on proxy environment variables instead of transparent iptables redirection. + +## Runtime flow + +High-level flow: + +1. `cmd/boundary/main.go` calls `cli.NewCommand(version)`. +2. `cli/cli.go` parses flags, environment variables, and optional YAML config into `config.CliConfig`. +3. `config.NewAppConfigFromCliConfig` builds `config.AppConfig` and validates session-correlation config. +4. If jail type is `nsjail`, `privilege.EnsurePrivileges()` re-execs through `sudo` and `setpriv` when needed. +5. `run.Run` generates a boundary session UUID and dispatches to `nsjail_manager.Run` or `landjail.Run`. +6. The selected backend decides whether the current process is a parent or child by checking `CHILD=true`. +7. The parent parses allow rules, builds the rule engine, sets up auditors, creates TLS config, starts the proxy, then starts the child process. +8. The child applies jail-specific network setup and runs the target command. +9. The proxy evaluates each HTTP or HTTPS request and audits the result. +10. The parent stops the proxy and cleans up host resources when the target command exits or a signal is received. + +## CLI and config + +The CLI is built with `github.com/coder/serpent` in `cli/cli.go`. + +Important config types: + +- `config.CliConfig`: serpent values for flags, environment variables, and YAML. +- `config.AppConfig`: runtime config passed into the jail backend and proxy setup. +- `config.SessionCorrelationConfig`: controls session-correlation header injection. +- `config.UserInfo`: resolves the effective user, including sudo scenarios. + +Important CLI behavior: + +- `--allow` is repeatable and CLI-only. +- YAML `allowlist` is merged with CLI `--allow` rules. +- `--jail-type` defaults to `nsjail`. +- `--use-real-dns` intentionally permits DNS exfiltration. Do not enable it by accident. +- `--disable-audit-logs` disables workspace-agent socket forwarding. It does not remove stderr logging. +- `--enable-session-correlation` requires configured inject targets or a valid fallback from `CODER_AGENT_URL`. +- `--log-proxy-socket-path` defaults to the Coder workspace-agent boundary log proxy socket path. + +When changing CLI flags: + +- Update README usage if behavior changes. +- Add or update config tests if parsing or validation changes. +- Check environment variable names. Some are shared with the Coder workspace agent. +- Preserve backwards compatibility unless the task explicitly allows breaking it. + +## Rules engine + +`rulesengine/` parses and evaluates allow rules. + +Rule grammar uses key-value tokens: + +```text +method=GET,POST domain=github.com path=/api/* +``` + +Supported keys: + +- `method`: one or more HTTP token values, comma-separated. `*` matches all methods. +- `domain`: hostname pattern. `*` can be a full label. +- `path`: one or more path patterns, comma-separated. + +Important matching semantics: + +- No matching allow rule means denied. +- `domain=github.com` matches only `github.com`. +- `domain=github.com` does not match `api.github.com`. +- `domain=*.github.com` matches subdomains like `api.github.com`. +- `domain=*.github.com` does not match the base domain `github.com`. +- To allow both a base domain and its subdomains, use two rules. +- Path wildcards are segment-based. A wildcard must be the entire segment. +- A path pattern ending in `*` can match additional path segments. + +When changing rule parsing or matching: + +- Update parser tests in `rulesengine/`. +- Update matcher tests in `rulesengine/`. +- Update README examples if user-visible behavior changes. +- Be careful with percent-encoded paths. Proxy forwarding preserves `RawPath` for cases like scoped npm package names. + +## Proxy + +`proxy/` contains the filtering proxy. It handles both transparent proxy traffic and explicit HTTP proxy traffic. + +Main files: + +- `proxy/proxy.go`: server lifecycle, TLS detection, HTTP and HTTPS processing, forwarding, block responses. +- `proxy/connect.go`: HTTP CONNECT tunnel support. +- `proxy/*_test.go`: proxy tests and framework. + +Request handling paths: + +1. Transparent HTTP: connection is not TLS, request is read directly, then evaluated. +2. Transparent HTTPS: first byte looks like TLS, boundary terminates TLS with a generated certificate, reads the HTTP request, then evaluates it. +3. Explicit HTTP proxy: client sends an absolute URL in the HTTP request. +4. Explicit HTTPS proxy: client sends CONNECT, boundary establishes a TLS tunnel, then reads HTTP requests inside the tunnel. + +Important proxy behavior: + +- Every request is audited before allow or deny handling completes. +- Audit sequence numbers are per proxy server instance and come from `audit.SequenceCounter`. +- Denied requests get a 403 response with suggested allow rules. +- Allowed requests are forwarded with a new upstream request. +- For GET and HEAD, forwarded request bodies are set to nil. +- Upstream responses are read fully so `Content-Length` can be set explicitly. +- Responses are normalized to HTTP/1.1 before writing back to the downstream client. +- Optional session-correlation headers are injected only when the request URL matches configured inject targets. + +When changing proxy behavior: + +- Prefer unit tests with `proxy/proxy_framework_test.go` and `httptest`. +- Avoid live network tests unless the behavior truly requires it. +- Test both allow and deny paths. +- Test both transparent and CONNECT paths when TLS behavior changes. +- Preserve audit behavior for both allowed and denied requests. + +## Audit + +`audit/` provides request auditing. + +Key types: + +- `audit.Request`: request decision payload. +- `audit.Auditor`: interface implemented by all auditors. +- `audit.LogAuditor`: writes structured logs through slog. +- `audit.SocketAuditor`: batches and forwards logs to the Coder workspace-agent socket. +- `audit.MultiAuditor`: fans out to multiple auditors. +- `audit.SequenceCounter`: atomic counter for per-request sequence numbers. + +Important behavior: + +- `SetupAuditor` always includes the log auditor. +- Socket forwarding is skipped when audit logs are disabled, the socket path is empty, or the socket does not exist. +- Socket auditor queues logs, batches them, retries connection failures, and reports drops. +- Allowed audit entries include the matching rule. +- Denied audit entries do not include a rule. +- Sequence numbers start at zero. + +When changing audit behavior: + +- Check `audit/socket_auditor_test.go` for batching, retry, drop, shutdown, and session ID expectations. +- Preserve the Coder boundary log proxy codec contract. +- Avoid blocking request handling on slow socket forwarding. + +## TLS + +`tls/` generates and loads certificates used for TLS interception. + +Key behavior: + +- A local CA is stored in the user's boundary config directory. +- Existing CA files are reused when possible. +- Per-host server certificates are generated on demand. +- The CA path is injected into child process environments so tools can trust boundary's generated certificates. + +When changing TLS behavior: + +- Preserve file ownership for the original user when running through sudo. +- Be careful with config directory paths from `config.UserInfo`. +- Consider the impact on curl, git, Python requests, and Node clients. +- Avoid broad certificate trust changes without explicit review. + +## nsjail backend + +`nsjail_manager/` is the default backend. + +Parent flow: + +1. Parse allow rules. +2. Build rule engine. +3. Set up audit. +4. Set up TLS and write CA certificate. +5. Create `nsjail.LinuxJail`. +6. Start the proxy. +7. Launch a child boundary process with `CHILD=true`. +8. Configure host-to-namespace communication after child PID exists. +9. Wait for child exit or signal. +10. Stop proxy and clean up iptables and veth state. + +Child flow: + +1. Wait for the jail-side veth interface. +2. Configure namespace networking. +3. Start dummy DNS and redirect DNS unless `--use-real-dns` is enabled. +4. Run the target command. + +Low-level networking behavior: + +- Host-side address: `192.168.100.1/24`. +- Jail-side address: `192.168.100.2/24`. +- Fixed subnet: `192.168.100.0/24`. +- TCP traffic from the jail is redirected to the local HTTP proxy with iptables. +- Non-TCP forwarding rules allow return traffic for non-TCP flows. +- Dummy DNS prevents DNS exfiltration by redirecting DNS to local dummy responses. + +High-risk details: + +- Interface names are constrained by Linux's 15-character interface name limit. +- iptables cleanup must mirror setup rules. +- `--no-user-namespace` changes clone flags and UID/GID mappings. +- `CAP_NET_ADMIN` and sometimes `CAP_SYS_ADMIN` are required. +- Non-HTTP TCP protocols are redirected but the proxy only understands HTTP and TLS-style traffic. + +## landjail backend + +`landjail/` uses Linux Landlock network restrictions. + +Differences from nsjail: + +- It does not set up transparent iptables routing. +- It sets `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` for the child. +- It clears `NO_PROXY` and `no_proxy` so clients do not bypass boundary. +- It configures CA-related environment variables for common clients. +- It restricts TCP connect to the proxy port. + +When changing landjail: + +- Check kernel and Landlock version assumptions. +- Preserve proxy env injection unless a task explicitly changes the model. +- Test that denied direct connections remain blocked. +- Remember that behavior depends on clients honoring proxy environment variables. + +## Privilege model + +`privilege/` handles Linux privilege escalation for the default nsjail backend. + +Behavior: + +- If needed, boundary re-execs through `sudo` and `setpriv`. +- It keeps the original user's UID/GID where possible. +- It adds ambient and inheritable capabilities required for network namespace and iptables work. +- Non-Linux builds use stubs. + +When changing privilege code: + +- Ask for review before implementation. +- Test both already-privileged and needs-escalation paths where possible. +- Preserve environment variables needed by child processes and the target command. +- Be cautious with PATH handling and sudo behavior. + +## Testing + +Normal validation: + +```sh +make unit-test +make build +``` + +Formatting and linting: + +```sh +make fmt +make fmt-check +make lint +``` + +E2E validation: + +```sh +make e2e-test +``` + +Important test facts: + +- `make unit-test` runs `go test -v -race $(go list ./... | grep -v e2e_tests)`. +- `make e2e-test` runs `sudo $(which go) test -v -race ./e2e_tests -count=1`. +- `make e2e-test` targets only the root `e2e_tests` package, not all subpackages. +- `make test-coverage` runs `go test -v -race -coverprofile=coverage.out ./...`, so it may include e2e packages. +- The Makefile currently does not define a `test` target. Do not use `make test` unless the Makefile changes. + +Testing guidance by area: + +- Rules changes: use parser and matcher tests in `rulesengine/`. +- Proxy changes: prefer `proxy/` unit tests with `httptest` and the proxy test framework. +- Config changes: use `config/*_test.go` and explicit environment slices. +- Audit changes: use `audit/*_test.go`, especially socket auditor behavior. +- nsjail and landjail changes: add focused unit tests where possible, then run e2e only on a suitable Linux sudo host. + +Avoid adding new sleeps in tests. Prefer readiness checks, channels, contexts, test servers, and explicit process state checks. Existing tests contain sleeps, but that should not become the default pattern for new code. + +## CI and releases + +CI lives in `.github/workflows/ci.yml`. + +Current CI behavior: + +- Uses Go 1.25. +- Runs `make deps`. +- Runs `make fmt-check` and `make lint` in the lint job. +- Installs `golangci-lint` before linting. +- Bind-mounts `/run/systemd/resolve/resolv.conf` over `/etc/resolv.conf` before tests on Linux. +- Runs `make unit-test`. +- Runs `make e2e-test`. +- Runs `make build`. + +Build and release workflows: + +- `make build-all` builds Linux amd64 and Linux arm64 binaries. +- Build and release workflow files include Darwin artifact upload paths even though `make build-all` currently creates Linux binaries only. +- Release archives can be created from local `build/` output or downloaded workflow artifacts. + +When changing CI or releases: + +- Confirm Makefile targets exist before referencing them. +- Keep README, RELEASES, Makefile help, and workflows aligned. +- Avoid changing binary names or archive names without considering `install.sh`. +- Check whether artifacts are actually produced before uploading them. + +## Troubleshooting + +### `make test` fails with no rule + +Use `make unit-test` for regular tests. The current Makefile does not define `test`. + +### E2E tests fail with DNS issues + +CI bind-mounts `/run/systemd/resolve/resolv.conf` over `/etc/resolv.conf` so namespace tests can reach upstream DNS instead of the host stub resolver. Local environments may need similar attention. + +### E2E tests leave host networking residue + +Inspect iptables and veth state. Cleanup should remove rules that setup added. Be careful before deleting unrelated host rules. + +### Boundary cannot escalate privileges + +Check that `sudo` and `setpriv` exist and that the current user can use sudo. The default nsjail backend needs capabilities for network setup. + +### Port conflicts + +Default proxy port is `8080`. Default pprof port is `6060`. Use CLI flags or environment variables when running multiple instances. + +### HTTPS clients reject certificates + +Check the CA path in the user config directory and the environment variables injected into the child process. Different clients use different CA variables. + +### Rules do not match as expected + +Check exact vs wildcard domain semantics first. `domain=github.com` and `domain=*.github.com` are different rules. + +## Agent failure catalog + +### Symptom: agent runs `make test` + +Cause: generic Go habit or stale README/help references. + +Fix: inspect the Makefile and run `make unit-test` for normal validation. Use e2e only when appropriate. + +### Symptom: agent runs e2e tests in an unsuitable environment + +Cause: treating e2e tests like normal unit tests. + +Fix: stop and verify Linux, sudo, iptables, namespace support, required tools, and cleanup expectations. + +### Symptom: proxy tests miss CONNECT or transparent TLS paths + +Cause: testing only one request path. + +Fix: add coverage for the path affected by the code change. TLS, HTTP, and CONNECT can differ. + +### Symptom: allow-rule change breaks subdomain behavior + +Cause: confusing exact domain and wildcard domain matching. + +Fix: update tests for base domain, subdomain, and unrelated domain cases. + +### Symptom: audit socket changes block request handling + +Cause: doing synchronous socket work in the request path. + +Fix: keep queueing and batching behavior. Preserve drop and retry tests. + +### Symptom: workflow uploads artifacts that were never built + +Cause: workflow artifact paths drift from `make build-all` outputs. + +Fix: align Makefile, workflow uploads, RELEASES, and install script expectations. + +## Review checklist + +Before opening a PR: + +- [ ] The change is narrow and avoids unrelated cleanup. +- [ ] `go fmt` or `make fmt` was run for Go changes. +- [ ] Focused tests were run for the changed area. +- [ ] `make unit-test` was run unless the change is docs-only and the user agreed to skip it. +- [ ] E2E tests were only run on a suitable Linux sudo host. +- [ ] README, Makefile, workflows, and release docs are aligned when commands or binaries change. +- [ ] Privilege, TLS, iptables, and rule grammar changes received explicit review. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..59232fc --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,296 @@ +# Boundary architecture + +Boundary is a Linux network isolation tool that runs a child process with restricted network access. It intercepts HTTP and HTTPS traffic, evaluates each request against allow rules, and records an audit trail of what was allowed or denied. + +The practical goal is simple: run an agent or command with a default-deny network policy while still letting approved HTTP and HTTPS requests work normally. + +## High-level model + +Boundary has three moving parts: + +1. **CLI and configuration** parse rules, logging options, jail options, and the target command. +2. **Jail backend** starts the target command in a restricted environment. +3. **Proxy and policy engine** inspect HTTP and HTTPS requests, allow or block them, and emit audit logs. + +```text +user shell + | + | boundary --allow "domain=github.com" [separator] command args... + v +boundary parent process + | + | parses config, creates policy engine, starts proxy, starts jail + v +restricted child process + | + | HTTP and HTTPS traffic + v +boundary proxy + | + | evaluates method, host, path + +--> allowed: forward to upstream server + +--> denied: return HTTP 403 and audit the denial +``` + +## Repository map + +| Path | Responsibility | +|------|----------------| +| `cmd/boundary/main.go` | Binary entrypoint. Builds and runs the CLI command. | +| `cli/` | Command-line interface, flags, environment variables, YAML config loading, and privilege setup. | +| `config/` | Runtime configuration, user information, and session-correlation settings. | +| `run/` | Platform dispatch. Linux runs a jail backend. Non-Linux returns an unsupported-platform error. | +| `rulesengine/` | Allow-rule parsing and matching. | +| `proxy/` | HTTP and HTTPS proxy, transparent TLS detection, CONNECT support, forwarding, blocking, auditing, and session-correlation header injection. | +| `audit/` | Structured stderr audit logging and optional Coder workspace-agent socket forwarding. | +| `tls/` | Local CA management and per-host certificate generation for HTTPS interception. | +| `nsjail_manager/` | Default jail backend using Linux network namespaces, veth pairs, iptables, and dummy DNS. | +| `landjail/` | Alternative jail backend using Landlock restrictions and proxy environment variables. | +| `privilege/` | Linux privilege escalation through `sudo` and `setpriv` for the default backend. | +| `dnsdummy/` | DNS server used by the namespace backend to prevent DNS exfiltration. | +| `e2e_tests/` | Linux integration tests that require sudo and can mutate host networking. | + +## Startup flow + +The startup path is: + +```text +cmd/boundary/main.go + -> cli.NewCommand + -> config.NewAppConfigFromCliConfig + -> privilege.EnsurePrivileges, for nsjail only + -> log.SetupLogging + -> run.Run + -> nsjail_manager.Run or landjail.Run +``` + +The CLI builds a `config.AppConfig` from flags, environment variables, optional YAML, and the target command. Then `run.Run` assigns a new session UUID and dispatches to the requested jail backend. + +The default jail type is `nsjail`. That backend needs Linux network privileges, so the CLI calls `privilege.EnsurePrivileges()` before entering the runtime. If the current process does not have the required capabilities, Boundary re-execs itself through `sudo` and `setpriv` with the minimal capabilities it needs for networking setup. + +The `landjail` backend does not use the same privilege escalation path. + +## Parent and child process model + +Both jail backends use a parent and child process model. The selected backend checks the `CHILD=true` environment variable to decide which role the current process should run. + +### Parent process + +The parent process owns setup and cleanup: + +1. Parse allow rules. +2. Create the rule engine. +3. Set up audit logging. +4. Create or load the local CA. +5. Start the HTTP proxy. +6. Start the child process. +7. Wait for the child process to exit or for a termination signal. +8. Stop the proxy. +9. Clean up backend-specific resources. + +### Child process + +The child process runs the target command inside the restricted environment. Backend-specific setup happens before the target command starts. + +For `nsjail`, the child configures namespace networking and DNS behavior before running the target. For `landjail`, the child applies Landlock network restrictions before running the target. + +## Policy model + +Boundary uses a default-deny policy. Requests are allowed only when at least one allow rule matches. + +Allow rules are strings made of key-value pairs: + +```text +method=GET,HEAD domain=github.com path=/api/* +``` + +Supported keys are: + +- `method`: one or more HTTP methods, comma-separated. `*` matches every method. +- `domain`: an exact host or wildcard host pattern. +- `path`: one or more path patterns, comma-separated. + +Important matching rules: + +- `domain=github.com` matches only `github.com`. +- `domain=github.com` does not match `api.github.com`. +- `domain=*.github.com` matches subdomains such as `api.github.com`. +- `domain=*.github.com` does not match `github.com`. +- To allow a base domain and its subdomains, configure both patterns. +- Path wildcards are segment-based. A wildcard must be a whole path segment. + +The engine returns both the allow or deny decision and the matching rule, if one matched. Audit logs include the matched rule for allowed requests. + +## Proxy model + +The proxy is the enforcement point for HTTP and HTTPS traffic. + +It supports two styles of traffic: + +1. **Transparent traffic**, where the target process does not know about the proxy. The `nsjail` backend redirects TCP traffic to Boundary with iptables. +2. **Explicit proxy traffic**, where the target process uses `HTTP_PROXY` and `HTTPS_PROXY`. The `landjail` backend uses this model. + +### HTTP requests + +For plain HTTP, the proxy reads the request, reconstructs the full URL when needed, evaluates the method, host, and path, then either forwards the request or returns a 403 response. + +### HTTPS requests + +For HTTPS, Boundary acts as a local TLS endpoint so it can inspect the HTTP request inside the encrypted stream. It uses a local CA and generates per-host certificates on demand. + +The target process must trust Boundary's CA. Boundary injects common CA environment variables into the child process so tools such as curl, git, Python requests, and Node can trust the generated certificates. + +### CONNECT requests + +When a client uses Boundary as an explicit HTTP proxy for HTTPS, it sends a CONNECT request. Boundary accepts the CONNECT tunnel, performs TLS with the client, reads HTTP requests from inside the tunnel, and evaluates each request independently. + +### Forwarding and blocking + +For allowed requests, the proxy creates a new upstream request, copies appropriate headers, optionally injects session-correlation headers, and writes the upstream response back to the client. + +For denied requests, the proxy returns HTTP 403 with a short message and example allow rules. + +Every request is audited before the allow or deny handling completes. + +## nsjail backend + +`nsjail` is the default backend. It provides transparent network interception with Linux networking primitives. + +The backend creates a point-to-point network between the host and child namespace: + +```text +host namespace child network namespace + +boundary proxy :8080 target command + ^ | + | | TCP traffic +iptables REDIRECT v + | veth jail side +veth host side +``` + +Key details: + +- The host side of the veth pair uses `192.168.100.1/24`. +- The child side uses `192.168.100.2/24`. +- The fixed subnet is `192.168.100.0/24`. +- iptables NAT and REDIRECT rules send TCP traffic from the child namespace to the Boundary proxy. +- Non-TCP forwarding rules allow return traffic for non-TCP flows. +- A dummy DNS server can run inside the namespace to prevent DNS exfiltration. +- `--use-real-dns` intentionally disables the dummy DNS behavior. +- `--no-user-namespace` disables user namespace creation for restricted environments. + +The parent process configures host-side networking before the child runs. Once the child process exists, the parent moves the jail-side veth into the child's network namespace. The child then configures its IP address, loopback, and default route. + +Cleanup removes the iptables rules and veth interface created during setup. + +## landjail backend + +`landjail` is an alternative backend based on Linux Landlock network restrictions. + +Unlike `nsjail`, it does not rely on transparent iptables redirection. Instead, it configures the child process to use Boundary as an explicit proxy: + +- `HTTP_PROXY` +- `HTTPS_PROXY` +- `http_proxy` +- `https_proxy` + +It also clears `NO_PROXY` and `no_proxy` so the target command cannot bypass Boundary through proxy bypass lists. + +Landlock restricts the child so it can connect only to the Boundary proxy port. This means the backend depends on clients honoring proxy environment variables. A client that ignores those variables will generally fail to connect rather than bypass Boundary. + +## TLS and certificate trust + +Boundary uses TLS interception for HTTPS so it can evaluate host, path, method, and headers in the request. + +The TLS manager: + +1. Finds the user's Boundary config directory. +2. Loads an existing local CA if present. +3. Generates a new local CA if needed. +4. Writes the CA certificate for child processes to trust. +5. Generates per-host certificates for incoming TLS connections. + +The jail backends set environment variables for common tools: + +- `SSL_CERT_FILE` +- `SSL_CERT_DIR` +- `CURL_CA_BUNDLE` +- `GIT_SSL_CAINFO` +- `REQUESTS_CA_BUNDLE` +- `NODE_EXTRA_CA_CERTS` + +When Boundary runs through sudo, ownership and paths must still refer to the original user, not root, where possible. + +## Audit logging + +Boundary audits every HTTP and HTTPS request that reaches the proxy. + +An audit record includes: + +- method +- URL +- host +- allowed or denied decision +- matching rule for allowed requests +- per-session sequence number + +Boundary always creates a stderr log auditor. When running inside a compatible Coder workspace, it can also forward audit batches to the workspace agent over a Unix socket. The workspace agent then forwards the logs to coderd for centralized logging. + +`--disable-audit-logs` disables socket forwarding. It does not remove stderr logging. + +## Session correlation + +The proxy package contains support for injecting session-correlation headers into selected outbound requests. This is intended for Coder AI Gateway flows where downstream services need to correlate a Boundary audit event with an upstream request. + +The headers are defined in `config/session_correlation.go`: + +- `X-Coder-Agent-Firewall-Session-Id` +- `X-Coder-Agent-Firewall-Sequence-Number` + +Injection targets use the same rule engine semantics as normal allow rules. When changing this area, verify the end-to-end runtime path from CLI config through the selected jail backend into `proxy.Config`; unit tests for proxy support alone are not enough. + +## Security properties and limitations + +Boundary is designed for HTTP and HTTPS control. The default policy is deny, but the enforcement point is the proxy and the selected jail backend. + +Important limitations: + +- Boundary is Linux-only for runtime enforcement. +- The `nsjail` backend redirects TCP traffic, but the proxy understands HTTP, HTTPS, and CONNECT-style traffic. Arbitrary non-HTTP TCP protocols are not supported as normal allowed traffic. +- DNS behavior is backend-specific. The namespace backend uses dummy DNS by default to reduce DNS exfiltration. `--use-real-dns` changes that intentionally. +- The landjail backend depends on clients using proxy environment variables. +- The fixed namespace subnet can conflict with local networking in unusual environments. +- E2E tests can mutate host networking and require careful cleanup. + +## Development notes + +Useful commands: + +```sh +make build +make unit-test +make fmt +make fmt-check +make lint +``` + +E2E tests require Linux and sudo: + +```sh +make e2e-test +``` + +Read [e2e-tests.md](e2e-tests.md) before running or changing e2e tests. + +## Diagrams and related work + +The original design sketch is preserved here for context: + +Boundary + +Anthropic's sandbox runtime is a related architecture worth comparing when thinking about alternative isolation designs: + +https://github.com/anthropic-experimental/sandbox-runtime + +SRT diff --git a/docs/e2e-tests.md b/docs/e2e-tests.md new file mode 100644 index 0000000..c4f30da --- /dev/null +++ b/docs/e2e-tests.md @@ -0,0 +1,69 @@ +# Boundary e2e test guidance + +E2E tests in this directory are not normal unit tests. They can mutate host networking and require a suitable Linux sudo environment. + +Read this file before changing or running e2e tests. + +## Requirements + +Expected tools and host features include: + +- Linux +- sudo +- Go +- iptables +- ip +- nsenter +- curl +- dig +- nc +- Linux network namespaces +- Landlock support for landjail tests + +## Safety rules + +- Do not run e2e tests casually in a shared or fragile environment. +- Prefer focused package or test-name runs when debugging. +- Expect tests to create boundary binaries under temporary directories. +- Expect tests to create or inspect iptables rules, veth interfaces, and network namespaces. +- Check cleanup when a test fails or is interrupted. +- Do not delete unrelated host iptables rules during cleanup or debugging. + +## Commands + +The Makefile target is: + +```sh +make e2e-test +``` + +It currently runs: + +```sh +sudo $(which go) test -v -race ./e2e_tests -count=1 +``` + +That target runs the root `e2e_tests` package only. It does not run every e2e subpackage. If you need subpackage coverage, choose the package deliberately and document what you ran. + +Examples: + +```sh +sudo $(which go) test -v -race ./e2e_tests/nsjail -count=1 +sudo $(which go) test -v -race ./e2e_tests/landjail -count=1 +``` + +## Common pitfalls + +- DNS inside namespaces can fail if the host uses a stub resolver at `127.0.0.53`. +- iptables cleanup must remove exactly the rules added by setup. +- Port conflicts can occur when another boundary or proxy process is running. +- Existing sleeps in e2e helpers are not a pattern to copy. Prefer readiness checks when adding new tests. +- Some tests depend on external network behavior. Keep assertions focused and diagnostics clear. + +## When editing tests + +- Add targeted assertions for the behavior under test. +- Use unique ports, names, or temporary directories when tests can run concurrently. +- Preserve cleanup with `t.Cleanup` where possible. +- Capture enough diagnostics to debug host networking failures. +- Keep unit-level logic in package tests outside e2e when possible. diff --git a/e2e_tests/AGENTS.md b/e2e_tests/AGENTS.md deleted file mode 100644 index c4f30da..0000000 --- a/e2e_tests/AGENTS.md +++ /dev/null @@ -1,69 +0,0 @@ -# Boundary e2e test guidance - -E2E tests in this directory are not normal unit tests. They can mutate host networking and require a suitable Linux sudo environment. - -Read this file before changing or running e2e tests. - -## Requirements - -Expected tools and host features include: - -- Linux -- sudo -- Go -- iptables -- ip -- nsenter -- curl -- dig -- nc -- Linux network namespaces -- Landlock support for landjail tests - -## Safety rules - -- Do not run e2e tests casually in a shared or fragile environment. -- Prefer focused package or test-name runs when debugging. -- Expect tests to create boundary binaries under temporary directories. -- Expect tests to create or inspect iptables rules, veth interfaces, and network namespaces. -- Check cleanup when a test fails or is interrupted. -- Do not delete unrelated host iptables rules during cleanup or debugging. - -## Commands - -The Makefile target is: - -```sh -make e2e-test -``` - -It currently runs: - -```sh -sudo $(which go) test -v -race ./e2e_tests -count=1 -``` - -That target runs the root `e2e_tests` package only. It does not run every e2e subpackage. If you need subpackage coverage, choose the package deliberately and document what you ran. - -Examples: - -```sh -sudo $(which go) test -v -race ./e2e_tests/nsjail -count=1 -sudo $(which go) test -v -race ./e2e_tests/landjail -count=1 -``` - -## Common pitfalls - -- DNS inside namespaces can fail if the host uses a stub resolver at `127.0.0.53`. -- iptables cleanup must remove exactly the rules added by setup. -- Port conflicts can occur when another boundary or proxy process is running. -- Existing sleeps in e2e helpers are not a pattern to copy. Prefer readiness checks when adding new tests. -- Some tests depend on external network behavior. Keep assertions focused and diagnostics clear. - -## When editing tests - -- Add targeted assertions for the behavior under test. -- Use unique ports, names, or temporary directories when tests can run concurrently. -- Preserve cleanup with `t.Cleanup` where possible. -- Capture enough diagnostics to debug host networking failures. -- Keep unit-level logic in package tests outside e2e when possible. diff --git a/e2e_tests/AGENTS.md b/e2e_tests/AGENTS.md new file mode 120000 index 0000000..cbe6498 --- /dev/null +++ b/e2e_tests/AGENTS.md @@ -0,0 +1 @@ +../docs/e2e-tests.md \ No newline at end of file From 7031115063d90704ad5ad84246de94ff89c69922 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:23:27 +0000 Subject: [PATCH 4/8] docs: remove markdown hrefs --- AGENTS.md | 10 +++++----- README.md | 2 +- docs/agent-guide.md | 2 +- docs/architecture.md | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index aba34d9..1453bbc 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,14 +4,14 @@ Boundary is a Linux network isolation tool for monitoring and restricting HTTP a ## Canonical docs -- Human architecture overview: [docs/architecture.md](docs/architecture.md) -- Agent workflow guide: [docs/agent-guide.md](docs/agent-guide.md) -- E2E test safety guide: [docs/e2e-tests.md](docs/e2e-tests.md) +- Human architecture overview: `docs/architecture.md` +- Agent workflow guide: `docs/agent-guide.md` +- E2E test safety guide: `docs/e2e-tests.md` ## Non-negotiable rules -- Read [docs/agent-guide.md](docs/agent-guide.md) before making non-trivial changes. -- Read [docs/e2e-tests.md](docs/e2e-tests.md) before running or changing e2e tests. +- Read `docs/agent-guide.md` before making non-trivial changes. +- Read `docs/e2e-tests.md` before running or changing e2e tests. - Use `make unit-test` for normal validation. Do not assume `make test` exists. - Ask before changing privilege escalation, iptables rules, certificate trust behavior, release workflow semantics, or the allow-rule grammar. diff --git a/README.md b/README.md index 44d0b3c..a91a374 100644 --- a/README.md +++ b/README.md @@ -184,7 +184,7 @@ make lint # Lint code ## Architecture -For detailed information about how `boundary` works internally, see [docs/architecture.md](docs/architecture.md). +For detailed information about how `boundary` works internally, see `docs/architecture.md`. ## License diff --git a/docs/agent-guide.md b/docs/agent-guide.md index fc31c4f..489fabb 100644 --- a/docs/agent-guide.md +++ b/docs/agent-guide.md @@ -1,6 +1,6 @@ # Boundary agent guide -This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. For a human-facing system overview, read [architecture.md](architecture.md). +This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. For a human-facing system overview, read `docs/architecture.md`. ## Repository map diff --git a/docs/architecture.md b/docs/architecture.md index 59232fc..b61421f 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -281,7 +281,7 @@ E2E tests require Linux and sudo: make e2e-test ``` -Read [e2e-tests.md](e2e-tests.md) before running or changing e2e tests. +Read `docs/e2e-tests.md` before running or changing e2e tests. ## Diagrams and related work From 4d16371d21b1ebf12ae0febde823fe1ef09dd26a Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:28:53 +0000 Subject: [PATCH 5/8] docs: align entrypoints with docs layout --- .agents/docs | 2 +- .claude/docs/BOUNDARY_AGENT_GUIDE.md | 1 - AGENTS.md | 6 ++---- ARCHITECTURE.md | 1 - 4 files changed, 3 insertions(+), 7 deletions(-) delete mode 120000 .claude/docs/BOUNDARY_AGENT_GUIDE.md delete mode 120000 ARCHITECTURE.md diff --git a/.agents/docs b/.agents/docs index daf0269..a9594bf 120000 --- a/.agents/docs +++ b/.agents/docs @@ -1 +1 @@ -../.claude/docs \ No newline at end of file +../docs \ No newline at end of file diff --git a/.claude/docs/BOUNDARY_AGENT_GUIDE.md b/.claude/docs/BOUNDARY_AGENT_GUIDE.md deleted file mode 120000 index 36fd01a..0000000 --- a/.claude/docs/BOUNDARY_AGENT_GUIDE.md +++ /dev/null @@ -1 +0,0 @@ -../../docs/agent-guide.md \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 1453bbc..0669067 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,10 +15,8 @@ Boundary is a Linux network isolation tool for monitoring and restricting HTTP a - Use `make unit-test` for normal validation. Do not assume `make test` exists. - Ask before changing privilege escalation, iptables rules, certificate trust behavior, release workflow semantics, or the allow-rule grammar. -## Compatibility links +## Entrypoints - `CLAUDE.md` points to this file for Claude-style agent runtimes. -- `ARCHITECTURE.md` points to `docs/architecture.md` for existing links. -- `.claude/docs/BOUNDARY_AGENT_GUIDE.md` points to `docs/agent-guide.md`. -- `.agents/docs` points to `.claude/docs` for agent runtimes that look under `.agents`. +- `.agents/docs` points to `docs/` for agent runtimes that look under `.agents`. - `e2e_tests/AGENTS.md` points to `docs/e2e-tests.md`. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md deleted file mode 120000 index 6d39509..0000000 --- a/ARCHITECTURE.md +++ /dev/null @@ -1 +0,0 @@ -docs/architecture.md \ No newline at end of file From 34bd66f52f58ae364f88372305aa08939289ddd3 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:39:06 +0000 Subject: [PATCH 6/8] fix(docs): deduplicate agent-guide, fix README link and diagram placeholder - README.md: restore clickable Markdown link to docs/architecture.md - docs/architecture.md: replace [separator] with actual CLI separator (--) - docs/agent-guide.md: replace duplicated architecture/runtime/proxy/audit/ TLS/backend/privilege sections with cross-references to architecture.md, keeping only agent-specific workflow, change guidance, testing, troubleshooting, and failure catalog --- README.md | 2 +- docs/agent-guide.md | 189 ++++--------------------------------------- docs/architecture.md | 2 +- 3 files changed, 16 insertions(+), 177 deletions(-) diff --git a/README.md b/README.md index a91a374..44d0b3c 100644 --- a/README.md +++ b/README.md @@ -184,7 +184,7 @@ make lint # Lint code ## Architecture -For detailed information about how `boundary` works internally, see `docs/architecture.md`. +For detailed information about how `boundary` works internally, see [docs/architecture.md](docs/architecture.md). ## License diff --git a/docs/agent-guide.md b/docs/agent-guide.md index 489fabb..0b4ea2a 100644 --- a/docs/agent-guide.md +++ b/docs/agent-guide.md @@ -2,60 +2,11 @@ This guide gives autonomous agents the context needed to change `github.com/coder/boundary` safely. It is intentionally consolidated so agents can load one detailed handbook after reading the root `AGENTS.md`. For a human-facing system overview, read `docs/architecture.md`. -## Repository map - -| Path | Purpose | -|------|---------| -| `cmd/boundary/main.go` | Binary entrypoint. Creates the CLI command and exits with errors. | -| `cli/` | Serpent CLI, flags, environment variables, YAML config loading, privilege gate. | -| `config/` | App config, user info, session correlation config, header names. | -| `run/` | Platform dispatch. Linux runs a jail backend, non-Linux returns unsupported. | -| `proxy/` | HTTP and HTTPS filtering proxy, CONNECT support, TLS detection, audit, session correlation. | -| `rulesengine/` | Allow-rule parser and matcher. Default-deny policy. | -| `audit/` | Log auditor, socket auditor, multi-auditor, sequence counter. | -| `tls/` | Local CA creation/loading and per-host certificate generation. | -| `nsjail_manager/` | Default Linux namespace backend. Parent and child process orchestration. | -| `nsjail_manager/nsjail/` | Low-level veth, iptables, dummy DNS, env, and command runner code. | -| `landjail/` | Landlock backend using proxy env vars rather than transparent iptables routing. | -| `privilege/` | Linux privilege escalation through `sudo` and `setpriv`; non-Linux stubs. | -| `dnsdummy/` | Dummy DNS server used to prevent DNS exfiltration in namespace mode. | -| `log/` | slog setup to stderr or files. | -| `e2e_tests/` | Linux sudo tests that can mutate host networking. | -| `.github/workflows/` | CI, build, and release workflows. | -| `docs/architecture.md` | Human-facing overview of how Boundary works. | - -## Architecture overview - -Boundary runs a target command in a restricted environment and sends its HTTP and HTTPS traffic through a local filtering proxy. Requests are evaluated against allow rules. Anything that does not match an allow rule is denied. - -Core concepts: - -- Default deny: no rule means no outbound HTTP or HTTPS request is allowed. -- Parent process: sets up proxying, audit, TLS, and jail infrastructure. -- Child process: runs inside the selected jail backend and executes the target command. -- Proxy: parses requests, evaluates allow rules, audits decisions, forwards allowed traffic, and blocks denied traffic. -- Auditor: logs every request decision to stderr and optionally to the Coder workspace-agent socket. -- TLS manager: creates a local CA and per-host certificates so HTTPS can be inspected. - -Boundary has two jail backends: - -- `nsjail`: default. Uses Linux network namespaces, veth pairs, iptables NAT and REDIRECT rules, and optional user namespaces. -- `landjail`: uses Landlock network restrictions. It relies on proxy environment variables instead of transparent iptables redirection. - -## Runtime flow - -High-level flow: - -1. `cmd/boundary/main.go` calls `cli.NewCommand(version)`. -2. `cli/cli.go` parses flags, environment variables, and optional YAML config into `config.CliConfig`. -3. `config.NewAppConfigFromCliConfig` builds `config.AppConfig` and validates session-correlation config. -4. If jail type is `nsjail`, `privilege.EnsurePrivileges()` re-execs through `sudo` and `setpriv` when needed. -5. `run.Run` generates a boundary session UUID and dispatches to `nsjail_manager.Run` or `landjail.Run`. -6. The selected backend decides whether the current process is a parent or child by checking `CHILD=true`. -7. The parent parses allow rules, builds the rule engine, sets up auditors, creates TLS config, starts the proxy, then starts the child process. -8. The child applies jail-specific network setup and runs the target command. -9. The proxy evaluates each HTTP or HTTPS request and audits the result. -10. The parent stops the proxy and cleans up host resources when the target command exits or a signal is received. +## Architecture and runtime + +See [docs/architecture.md](architecture.md) for the repository map, high-level model, startup flow, parent/child process model, policy model, proxy model, backend details, TLS, audit logging, session correlation, and security limitations. + +The rest of this guide focuses on agent-specific workflow, change guidance, testing, and troubleshooting. ## CLI and config @@ -87,30 +38,7 @@ When changing CLI flags: ## Rules engine -`rulesengine/` parses and evaluates allow rules. - -Rule grammar uses key-value tokens: - -```text -method=GET,POST domain=github.com path=/api/* -``` - -Supported keys: - -- `method`: one or more HTTP token values, comma-separated. `*` matches all methods. -- `domain`: hostname pattern. `*` can be a full label. -- `path`: one or more path patterns, comma-separated. - -Important matching semantics: - -- No matching allow rule means denied. -- `domain=github.com` matches only `github.com`. -- `domain=github.com` does not match `api.github.com`. -- `domain=*.github.com` matches subdomains like `api.github.com`. -- `domain=*.github.com` does not match the base domain `github.com`. -- To allow both a base domain and its subdomains, use two rules. -- Path wildcards are segment-based. A wildcard must be the entire segment. -- A path pattern ending in `*` can match additional path segments. +See the Policy model section in [docs/architecture.md](architecture.md) for the allow-rule grammar and matching semantics. When changing rule parsing or matching: @@ -121,31 +49,9 @@ When changing rule parsing or matching: ## Proxy -`proxy/` contains the filtering proxy. It handles both transparent proxy traffic and explicit HTTP proxy traffic. - -Main files: - -- `proxy/proxy.go`: server lifecycle, TLS detection, HTTP and HTTPS processing, forwarding, block responses. -- `proxy/connect.go`: HTTP CONNECT tunnel support. -- `proxy/*_test.go`: proxy tests and framework. - -Request handling paths: +See the Proxy model section in [docs/architecture.md](architecture.md) for HTTP, HTTPS, and CONNECT request paths and forwarding/blocking behavior. -1. Transparent HTTP: connection is not TLS, request is read directly, then evaluated. -2. Transparent HTTPS: first byte looks like TLS, boundary terminates TLS with a generated certificate, reads the HTTP request, then evaluates it. -3. Explicit HTTP proxy: client sends an absolute URL in the HTTP request. -4. Explicit HTTPS proxy: client sends CONNECT, boundary establishes a TLS tunnel, then reads HTTP requests inside the tunnel. - -Important proxy behavior: - -- Every request is audited before allow or deny handling completes. -- Audit sequence numbers are per proxy server instance and come from `audit.SequenceCounter`. -- Denied requests get a 403 response with suggested allow rules. -- Allowed requests are forwarded with a new upstream request. -- For GET and HEAD, forwarded request bodies are set to nil. -- Upstream responses are read fully so `Content-Length` can be set explicitly. -- Responses are normalized to HTTP/1.1 before writing back to the downstream client. -- Optional session-correlation headers are injected only when the request URL matches configured inject targets. +Key files: `proxy/proxy.go`, `proxy/connect.go`, `proxy/*_test.go`. When changing proxy behavior: @@ -157,25 +63,9 @@ When changing proxy behavior: ## Audit -`audit/` provides request auditing. - -Key types: +See the Audit logging section in [docs/architecture.md](architecture.md) for the audit model. -- `audit.Request`: request decision payload. -- `audit.Auditor`: interface implemented by all auditors. -- `audit.LogAuditor`: writes structured logs through slog. -- `audit.SocketAuditor`: batches and forwards logs to the Coder workspace-agent socket. -- `audit.MultiAuditor`: fans out to multiple auditors. -- `audit.SequenceCounter`: atomic counter for per-request sequence numbers. - -Important behavior: - -- `SetupAuditor` always includes the log auditor. -- Socket forwarding is skipped when audit logs are disabled, the socket path is empty, or the socket does not exist. -- Socket auditor queues logs, batches them, retries connection failures, and reports drops. -- Allowed audit entries include the matching rule. -- Denied audit entries do not include a rule. -- Sequence numbers start at zero. +Key types: `audit.Request`, `audit.Auditor`, `audit.LogAuditor`, `audit.SocketAuditor`, `audit.MultiAuditor`, `audit.SequenceCounter`. When changing audit behavior: @@ -185,14 +75,7 @@ When changing audit behavior: ## TLS -`tls/` generates and loads certificates used for TLS interception. - -Key behavior: - -- A local CA is stored in the user's boundary config directory. -- Existing CA files are reused when possible. -- Per-host server certificates are generated on demand. -- The CA path is injected into child process environments so tools can trust boundary's generated certificates. +See the TLS and certificate trust section in [docs/architecture.md](architecture.md) for the CA and certificate model. When changing TLS behavior: @@ -203,36 +86,7 @@ When changing TLS behavior: ## nsjail backend -`nsjail_manager/` is the default backend. - -Parent flow: - -1. Parse allow rules. -2. Build rule engine. -3. Set up audit. -4. Set up TLS and write CA certificate. -5. Create `nsjail.LinuxJail`. -6. Start the proxy. -7. Launch a child boundary process with `CHILD=true`. -8. Configure host-to-namespace communication after child PID exists. -9. Wait for child exit or signal. -10. Stop proxy and clean up iptables and veth state. - -Child flow: - -1. Wait for the jail-side veth interface. -2. Configure namespace networking. -3. Start dummy DNS and redirect DNS unless `--use-real-dns` is enabled. -4. Run the target command. - -Low-level networking behavior: - -- Host-side address: `192.168.100.1/24`. -- Jail-side address: `192.168.100.2/24`. -- Fixed subnet: `192.168.100.0/24`. -- TCP traffic from the jail is redirected to the local HTTP proxy with iptables. -- Non-TCP forwarding rules allow return traffic for non-TCP flows. -- Dummy DNS prevents DNS exfiltration by redirecting DNS to local dummy responses. +See the nsjail backend section in [docs/architecture.md](architecture.md) for the namespace, veth, iptables, and DNS model. High-risk details: @@ -244,15 +98,7 @@ High-risk details: ## landjail backend -`landjail/` uses Linux Landlock network restrictions. - -Differences from nsjail: - -- It does not set up transparent iptables routing. -- It sets `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` for the child. -- It clears `NO_PROXY` and `no_proxy` so clients do not bypass boundary. -- It configures CA-related environment variables for common clients. -- It restricts TCP connect to the proxy port. +See the landjail backend section in [docs/architecture.md](architecture.md) for the Landlock and proxy-env model. When changing landjail: @@ -263,14 +109,7 @@ When changing landjail: ## Privilege model -`privilege/` handles Linux privilege escalation for the default nsjail backend. - -Behavior: - -- If needed, boundary re-execs through `sudo` and `setpriv`. -- It keeps the original user's UID/GID where possible. -- It adds ambient and inheritable capabilities required for network namespace and iptables work. -- Non-Linux builds use stubs. +See the Startup flow section in [docs/architecture.md](architecture.md) for how privilege escalation fits into the runtime. When changing privilege code: diff --git a/docs/architecture.md b/docs/architecture.md index b61421f..e9b36e0 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -15,7 +15,7 @@ Boundary has three moving parts: ```text user shell | - | boundary --allow "domain=github.com" [separator] command args... + | boundary --allow "domain=github.com" -- command args... v boundary parent process | From f4d9a06715c4517f20dc3394ba331ac4839de2b7 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:41:59 +0000 Subject: [PATCH 7/8] fix(docs): correct content accuracy against codebase - architecture.md: add missing log/ and nsjail_manager/nsjail/ to repo map - architecture.md: document that domain wildcard matches deeper subdomains - architecture.md: document that trailing path wildcard matches multiple segments - architecture.md: clarify CONNECT handshake requests are not audited - agent-guide.md: note make ci also fails due to missing test target --- docs/agent-guide.md | 2 +- docs/architecture.md | 9 ++++++--- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/docs/agent-guide.md b/docs/agent-guide.md index 0b4ea2a..b635aeb 100644 --- a/docs/agent-guide.md +++ b/docs/agent-guide.md @@ -191,7 +191,7 @@ When changing CI or releases: ### `make test` fails with no rule -Use `make unit-test` for regular tests. The current Makefile does not define `test`. +Use `make unit-test` for regular tests. The current Makefile does not define a `test` target. Note that `make ci` also depends on `test`, so it will fail for the same reason; use individual targets instead. ### E2E tests fail with DNS issues diff --git a/docs/architecture.md b/docs/architecture.md index e9b36e0..df75a05 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -44,10 +44,12 @@ boundary proxy | `proxy/` | HTTP and HTTPS proxy, transparent TLS detection, CONNECT support, forwarding, blocking, auditing, and session-correlation header injection. | | `audit/` | Structured stderr audit logging and optional Coder workspace-agent socket forwarding. | | `tls/` | Local CA management and per-host certificate generation for HTTPS interception. | -| `nsjail_manager/` | Default jail backend using Linux network namespaces, veth pairs, iptables, and dummy DNS. | +| `nsjail_manager/` | Default jail backend. Parent/child orchestration, proxy setup, and cleanup. | +| `nsjail_manager/nsjail/` | Low-level Linux namespace networking: veth, iptables, dummy DNS, env, and command runner. | | `landjail/` | Alternative jail backend using Landlock restrictions and proxy environment variables. | | `privilege/` | Linux privilege escalation through `sudo` and `setpriv` for the default backend. | | `dnsdummy/` | DNS server used by the namespace backend to prevent DNS exfiltration. | +| `log/` | slog setup for stderr and file logging. | | `e2e_tests/` | Linux integration tests that require sudo and can mutate host networking. | ## Startup flow @@ -114,10 +116,11 @@ Important matching rules: - `domain=github.com` matches only `github.com`. - `domain=github.com` does not match `api.github.com`. -- `domain=*.github.com` matches subdomains such as `api.github.com`. +- `domain=*.github.com` matches subdomains such as `api.github.com` and deeper subdomains such as `v1.api.github.com`. - `domain=*.github.com` does not match `github.com`. - To allow a base domain and its subdomains, configure both patterns. - Path wildcards are segment-based. A wildcard must be a whole path segment. +- A trailing `*` segment matches multiple remaining segments: `path=/api/*` matches `/api/v1/users`. The engine returns both the allow or deny decision and the matching rule, if one matched. Audit logs include the matched rule for allowed requests. @@ -150,7 +153,7 @@ For allowed requests, the proxy creates a new upstream request, copies appropria For denied requests, the proxy returns HTTP 403 with a short message and example allow rules. -Every request is audited before the allow or deny handling completes. +Every HTTP request that reaches the proxy is audited before the allow or deny handling completes. CONNECT handshake requests themselves are not audited; only the HTTP requests inside the resulting tunnel are audited. ## nsjail backend From 203d546dd8176c702da36db91b6cc8b4c2581b88 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 May 2026 11:45:12 +0000 Subject: [PATCH 8/8] fix(docs): remove pointless Entrypoints section from AGENTS.md Agents reading AGENTS.md are already at the right file. Documenting that CLAUDE.md symlinks here or that .agents/docs exists is meta-information about the file layout that does not help agents do actual work. --- AGENTS.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 0669067..f62e70b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,8 +15,4 @@ Boundary is a Linux network isolation tool for monitoring and restricting HTTP a - Use `make unit-test` for normal validation. Do not assume `make test` exists. - Ask before changing privilege escalation, iptables rules, certificate trust behavior, release workflow semantics, or the allow-rule grammar. -## Entrypoints -- `CLAUDE.md` points to this file for Claude-style agent runtimes. -- `.agents/docs` points to `docs/` for agent runtimes that look under `.agents`. -- `e2e_tests/AGENTS.md` points to `docs/e2e-tests.md`.