|
| 1 | +# Agents |
| 2 | + |
| 3 | +This repository ships a small in-guest helper called **intar-agent**. It runs inside every scenario VM and evaluates the probes defined in your `.hcl` files, sending the results back to the host over a virtio-serial channel. |
| 4 | + |
| 5 | +## Project layout |
| 6 | +- `crates/intar-cli`: CLI entrypoint, build script, and agent embedding. |
| 7 | +- `crates/intar-vm`: VM orchestration, cloud-init, runner, and host-side wiring. |
| 8 | +- `crates/intar-agent`: Guest-side agent that executes probes. |
| 9 | +- `crates/intar-probes`: Shared probe spec + parsing/validation logic (host and guest). |
| 10 | +- `crates/intar-ui`: TUI (ratatui/crossterm). |
| 11 | +- `scenarios/`: Example scenarios and probe definitions. |
| 12 | + |
| 13 | +## Instruction scope |
| 14 | +- This root `AGENTS.md` applies to the entire repository. |
| 15 | +- If a subdirectory needs different rules, add a nested `AGENTS.md` (or `AGENTS.override.md`) in that folder. |
| 16 | +- Keep instructions concise and split large guidance across nested files when needed. |
| 17 | + |
| 18 | +## Common commands |
| 19 | +- `intar start <scenario.hcl>` to run a scenario end-to-end. |
| 20 | +- `just check` before shipping changes (fmt + clippy + nextest). |
| 21 | + |
| 22 | +## Change checklist |
| 23 | +- When adding a new probe type: update `crates/intar-probes` and `crates/intar-agent` together, then adjust docs in this file. |
| 24 | +- When editing scenario probes: include an optional `description` to improve the UI briefing/objectives panels. |
| 25 | +- If you touch the agent protocol, update both host and guest handling plus this protocol section. |
| 26 | +- Do not delete run artifacts while a scenario is still running; cleanup happens after testing, not during the run. |
| 27 | + |
| 28 | +## Commit messages |
| 29 | +Strict Conventional Commits format (no footers): |
| 30 | +``` |
| 31 | +<type>(<scope>): <imperative summary> |
| 32 | +
|
| 33 | +Why: |
| 34 | +- <root cause / problem> |
| 35 | +- <why this approach> |
| 36 | +
|
| 37 | +Impact: |
| 38 | +- <user-visible impact / risk / perf / compat> |
| 39 | +- <tests run or "Tests: not run (reason)"> |
| 40 | +
|
| 41 | +Breaking: |
| 42 | +- <detail> # only when applicable |
| 43 | +``` |
| 44 | + |
| 45 | +Rules (strict): |
| 46 | +- Subject is imperative, <= 72 chars, no trailing period. |
| 47 | +- Scope is required and must be a crate or area: `ui`, `vm`, `agent`, `probes`, |
| 48 | + `cli`, `core`, `docs`, `infra`. |
| 49 | +- Body is required for every commit and must include **Why** and **Impact** |
| 50 | + sections exactly as shown. |
| 51 | +- Use bullet points under **Why** and **Impact** (at least one each). |
| 52 | +- Use **Breaking** only when relevant. |
| 53 | +- Wrap body lines at ~72 chars. |
| 54 | +- If multiple areas change, pick the dominant scope and mention the others in |
| 55 | + **Impact**. |
| 56 | + |
| 57 | +Allowed types: `feat`, `fix`, `refactor`, `docs`, `test`, `style`, `chore`. |
| 58 | + |
| 59 | +Examples: |
| 60 | +``` |
| 61 | +feat(ui): add mission briefing screen |
| 62 | +
|
| 63 | +Why: |
| 64 | +- make objectives discoverable before running |
| 65 | +
|
| 66 | +Impact: |
| 67 | +- adds new briefing tab and pre-run screen |
| 68 | +- Tests: not run (ui change only) |
| 69 | +``` |
| 70 | + |
| 71 | +``` |
| 72 | +fix(vm): avoid double-boot probes |
| 73 | +
|
| 74 | +Why: |
| 75 | +- boot checks were retried on every reconnect |
| 76 | +
|
| 77 | +Impact: |
| 78 | +- reduces boot time and removes duplicate probe hits |
| 79 | +- Tests: just check |
| 80 | +``` |
| 81 | + |
| 82 | +## Safety |
| 83 | +- Never run `git restore` without asking first (it discards local changes). |
| 84 | +- Always cleanup VM resources (logs, caches, images, snapshots) after testing - not the scenario run. |
| 85 | + |
| 86 | +## Validation |
| 87 | +- Run `just check` before shipping changes (fmt + clippy + nextest). |
| 88 | + |
| 89 | +## Lifecycle |
| 90 | +- Build time: `crates/intar-cli/build.rs` cross-compiles `intar-agent` for `x86_64-unknown-linux-musl` and `aarch64-unknown-linux-musl` via `cargo zigbuild`, then embeds both binaries with `include_bytes!` in `crates/intar-cli/src/agent.rs`. If the build tools are missing, placeholders are written and `intar start` will refuse to run. |
| 91 | +- Start-up: `intar start <scenario.hcl>` base64-embeds the correct agent binary into cloud-init (see `crates/intar-vm/src/cloud_init.rs`) and drops a systemd unit that keeps `intar-agent` running. |
| 92 | +- Guest side: the agent opens `/dev/virtio-ports/intar.agent` (fallback `/dev/vport0p1`), reads newline-delimited JSON requests, and replies on the same handle. |
| 93 | +- Host side: QEMU exposes the virtio-serial port as a Unix socket at `<run_dir>/<vm>-serial.sock`; `ScenarioRunner::wait_for_agents` pings the agent until it responds before probes are dispatched. |
| 94 | + |
| 95 | +## Protocol (newline-delimited JSON) |
| 96 | +**Requests** |
| 97 | +- `ping` |
| 98 | +- `check_probe` `{ id, spec }` |
| 99 | +- `check_all` `{ probes: [(id, spec), ...] }` |
| 100 | + |
| 101 | +**Responses** |
| 102 | +- `pong` `{ uptime_secs }` |
| 103 | +- `probe_result` `{ id, passed, message }` |
| 104 | +- `all_results` `{ results: [ { id, passed, message }, ... ] }` |
| 105 | +- `error` `{ message }` |
| 106 | + |
| 107 | +Example round-trip: |
| 108 | +``` |
| 109 | +{"type":"ping"} |
| 110 | +{"type":"pong","uptime_secs":12} |
| 111 | +``` |
| 112 | + |
| 113 | +## Probe catalogue (handled inside the guest) |
| 114 | +- `file_content`: `path`, optional `contains`, optional `regex`. |
| 115 | +- `file_exists`: `path`, `exists` (bool). |
| 116 | +- `service`: `service`, `state` (`running|stopped|enabled|disabled`); uses `systemctl`. |
| 117 | +- `port`: `port`, `state` (`listening|closed`), optional `protocol` (`tcp` default); uses tokio sockets (TCP connect / UDP bind). |
| 118 | +- `tcp_ping`: `host`, optional `port` (default `1`), optional `timeout_ms` (default `2000`), optional `state` (`reachable|unreachable`, default `reachable`). |
| 119 | +- `k8s_nodes_ready`: `expected_ready`, optional `kubeconfig`, optional `context`. |
| 120 | +- `k8s_endpoints_nonempty`: `namespace`, `name`, optional `kubeconfig`, optional `context`. |
| 121 | +- `command`: `cmd`, `exit_code`, optional `stdout_contains`; executed via `sh -c`. |
| 122 | +- `http`: `url`, `status`, optional `body_contains`; uses `reqwest` with a 5s timeout. |
| 123 | + |
| 124 | +## Building / refreshing the agent |
| 125 | +Prereqs: `cargo install cargo-zigbuild`, `zig` available in `PATH` (e.g., `brew install zig`), and `qemu-img` for end-to-end runs. |
| 126 | + |
| 127 | +``` |
| 128 | +cargo zigbuild --release --target x86_64-unknown-linux-musl -p intar-agent |
| 129 | +cargo zigbuild --release --target aarch64-unknown-linux-musl -p intar-agent |
| 130 | +cargo build --release -p intar-cli # embeds the freshly built agents |
| 131 | +``` |
| 132 | + |
| 133 | +Artifacts land in `target/<target>/release/intar-agent`; the CLI copies them into `$OUT_DIR/intar-agent-{arch}` during its build script. |
| 134 | + |
| 135 | +## Debugging tips |
| 136 | +- Inside a VM: `systemctl status intar-agent` and `journalctl -u intar-agent` show agent logs (it also prints to stderr). |
| 137 | +- From the host: inspect the generated cloud-init for a run at `~/.local/state/intar/runs/<run>/logs/<vm>/user-data.yaml` to verify the agent blob is present. |
| 138 | +- Serial socket poking: `socat - UNIX-CONNECT:~/.local/state/intar/runs/<run>/<vm>-serial.sock` and send a `{"type":"ping"}` line to confirm connectivity. |
| 139 | +- Probe logic is shared with the host in `crates/intar-probes`; edit there when adding new probe types so both sides stay in sync. |
| 140 | + |
| 141 | +## UI notes |
| 142 | +- The Logs view shows the SSH session transcript only (input and output). VM console logs are not streamed there. |
| 143 | + |
| 144 | +## Comment guidelines (please read before editing the agent) |
| 145 | +- Avoid filler `//` comments that only restate the code; keep the file readable by letting the code speak for itself. |
| 146 | +- Add a short comment only when behavior is non-obvious (e.g., why we retry on virtio connect, why `/dev/vport0p1` is a fallback, or why a probe command tolerates a specific exit code). |
| 147 | +- Prefer logging (`tracing`/`eprintln!`) over comments when you want runtime visibility. |
| 148 | +- If a workaround is temporary, note the condition for removal in the comment (e.g., `// remove once cloud-localds is packaged on macOS`). |
0 commit comments