|
| 1 | +# Phase 1 Completion Design |
| 2 | + |
| 3 | +**Goal:** Implement the three remaining Phase 1 features: VSOCK guest agent + probes, Prometheus metrics, and E2E tests. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 1. VSOCK Guest Agent + Probes |
| 8 | + |
| 9 | +### Architecture |
| 10 | + |
| 11 | +The guest agent is a binary that runs inside each VM. The node agent injects it transparently at boot — no image modification required. When `guestAgent.enabled: false`, the VM boots bare with no injection and no probe support. |
| 12 | + |
| 13 | +``` |
| 14 | +Node Agent VM (ext4 rootfs) |
| 15 | +┌─────────────────────┐ ┌──────────────────────────┐ |
| 16 | +│ rootfs.Inject() │─── ext4 ───▶│ /.imp/guest-agent │ |
| 17 | +│ │ │ /.imp/init │ |
| 18 | +│ VSOCKClient │◀── gRPC ───▶│ gRPC server (port 10000) │ |
| 19 | +│ Exec() │ │ Exec handler │ |
| 20 | +│ HTTPCheck() │ │ HTTPCheck handler │ |
| 21 | +│ Metrics() │ │ Metrics handler │ |
| 22 | +└─────────────────────┘ └──────────────────────────┘ |
| 23 | + ▲ |
| 24 | + │ Firecracker Unix socket |
| 25 | + │ /run/imp/sockets/{vmid}.vsock |
| 26 | +``` |
| 27 | + |
| 28 | +### New packages and files |
| 29 | + |
| 30 | +| Path | Purpose | |
| 31 | +|------|---------| |
| 32 | +| `cmd/guest-agent/main.go` | Binary that runs inside the VM | |
| 33 | +| `internal/proto/guest/guest.proto` | gRPC service definition | |
| 34 | +| `internal/proto/guest/` | Generated protobuf + gRPC Go code | |
| 35 | +| `internal/guest/` | Server handlers (Exec, HTTPCheck, Metrics) | |
| 36 | +| `internal/agent/vsock/client.go` | Host-side gRPC client over VSOCK | |
| 37 | +| `internal/agent/probe/runner.go` | Probe polling goroutine per VM | |
| 38 | + |
| 39 | +The `imp-guest-agent` binary is embedded in the `imp-agent` container image using `//go:embed`. The node agent extracts it on startup to a temp path for injection. |
| 40 | + |
| 41 | +### OCI injection |
| 42 | + |
| 43 | +`internal/agent/rootfs/` gains an `Inject(guestAgentBytes []byte, ext4Path string) error` function that appends two files to the ext4 image after OCI→ext4 conversion: |
| 44 | + |
| 45 | +- `/.imp/guest-agent` — the agent binary (mode 0755) |
| 46 | +- `/.imp/init` — init wrapper (mode 0755): |
| 47 | + |
| 48 | +```sh |
| 49 | +#!/bin/sh |
| 50 | +/.imp/guest-agent & |
| 51 | +exec /sbin/init "$@" |
| 52 | +``` |
| 53 | + |
| 54 | +When `guestAgent.enabled` (default), the node agent: |
| 55 | +1. Calls `rootfs.Inject()` after image conversion |
| 56 | +2. Appends `init=/.imp/init` to `kernel_args` |
| 57 | + |
| 58 | +### gRPC service definition |
| 59 | + |
| 60 | +```protobuf |
| 61 | +syntax = "proto3"; |
| 62 | +package guest; |
| 63 | +option go_package = "github.com/syscode-labs/imp/internal/proto/guest"; |
| 64 | +
|
| 65 | +service GuestAgent { |
| 66 | + rpc Exec(ExecRequest) returns (ExecResponse); |
| 67 | + rpc HTTPCheck(HTTPCheckRequest) returns (HTTPCheckResponse); |
| 68 | + rpc Metrics(MetricsRequest) returns (MetricsResponse); |
| 69 | +} |
| 70 | +
|
| 71 | +message ExecRequest { repeated string command = 1; int32 timeout_seconds = 2; } |
| 72 | +message ExecResponse { int32 exit_code = 1; string stdout = 2; string stderr = 3; } |
| 73 | +
|
| 74 | +message HTTPCheckRequest { int32 port = 1; string path = 2; map<string,string> headers = 3; int32 timeout_seconds = 4; } |
| 75 | +message HTTPCheckResponse { int32 status_code = 1; } |
| 76 | +
|
| 77 | +message MetricsRequest {} |
| 78 | +message MetricsResponse { |
| 79 | + double cpu_usage_ratio = 1; // 0.0–1.0 |
| 80 | + int64 memory_used_bytes = 2; |
| 81 | + int64 disk_used_bytes = 3; |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +### VSOCK connection |
| 86 | + |
| 87 | +Firecracker exposes VSOCK via a Unix domain socket at `/run/imp/sockets/{vmid}.vsock`. To dial guest port 10000, the client connects to the Unix socket and sends `CONNECT 10000\n` before the gRPC handshake. `internal/agent/vsock/client.go` wraps this as a `net.Conn` and wires it into the gRPC `WithContextDialer`. |
| 88 | + |
| 89 | +### Probe runner |
| 90 | + |
| 91 | +`internal/agent/probe/runner.go` starts one goroutine per VM after it reaches `Running`. On each probe's period it calls the appropriate RPC, evaluates success/failure, and patches `ImpVM.status.conditions`: |
| 92 | + |
| 93 | +- `StartupProbe` — blocks readiness until passing; failures don't count against liveness |
| 94 | +- `ReadinessProbe` — sets `Ready` condition |
| 95 | +- `LivenessProbe` — sets `Healthy` condition; repeated failures trigger VM restart |
| 96 | + |
| 97 | +### Opt-out API field |
| 98 | + |
| 99 | +Added to `ImpVMClassSpec`, `ImpVMTemplateSpec`, and `ImpVMSpec` (same inheritance as probes): |
| 100 | + |
| 101 | +```go |
| 102 | +type GuestAgentConfig struct { |
| 103 | + // Enabled controls guest agent injection. Defaults to true. |
| 104 | + // Set to false for bare VMs that do not need probes or VM-side metrics. |
| 105 | + // +optional |
| 106 | + Enabled *bool `json:"enabled,omitempty"` |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +Resolved order: ImpVM → ImpVMTemplate → ImpVMClass → default (true). |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## 2. Prometheus Metrics |
| 115 | + |
| 116 | +### Node agent: `/metrics` on port 9090 |
| 117 | + |
| 118 | +One endpoint per node (DaemonSet pod), aggregating all VMs on that node. Scraped by a `PodMonitor`. |
| 119 | + |
| 120 | +**From Firecracker API** (polled every 15s per running VM): |
| 121 | + |
| 122 | +| Metric | Type | Description | |
| 123 | +|--------|------|-------------| |
| 124 | +| `imp_vm_vcpu_time_seconds_total` | Counter | vCPU execution time | |
| 125 | +| `imp_vm_memory_balloon_bytes` | Gauge | Memory balloon size | |
| 126 | + |
| 127 | +**From guest agent VSOCK** (polled every 15s; skipped when `guestAgent.enabled: false`): |
| 128 | + |
| 129 | +| Metric | Type | Description | |
| 130 | +|--------|------|-------------| |
| 131 | +| `imp_vm_guest_cpu_usage_ratio` | Gauge | CPU usage 0.0–1.0 | |
| 132 | +| `imp_vm_guest_memory_used_bytes` | Gauge | RSS memory used | |
| 133 | +| `imp_vm_guest_disk_used_bytes` | Gauge | Root disk used | |
| 134 | + |
| 135 | +All node-agent metrics carry labels: `impvm`, `namespace`, `node`, `impvmclass`. |
| 136 | + |
| 137 | +**VM lifecycle state gauge** (always): |
| 138 | + |
| 139 | +| Metric | Type | Description | |
| 140 | +|--------|------|-------------| |
| 141 | +| `imp_vm_state` | Gauge | 1 for current state, labels: `impvm`, `namespace`, `node`, `state` | |
| 142 | + |
| 143 | +### Operator: `:8080/metrics` |
| 144 | + |
| 145 | +controller-runtime already exposes reconcile counters and latency. We register two additional histograms at manager startup: |
| 146 | + |
| 147 | +| Metric | Type | Description | |
| 148 | +|--------|------|-------------| |
| 149 | +| `imp_vm_scheduling_latency_seconds` | Histogram | Pending → Scheduled | |
| 150 | +| `imp_vm_boot_latency_seconds` | Histogram | Scheduled → Running | |
| 151 | + |
| 152 | +Timestamps recorded by the operator into `ImpVM.status.timestamps` (new fields); the histogram observations happen when the agent patches status to `Running`. |
| 153 | + |
| 154 | +### Helm chart additions |
| 155 | + |
| 156 | +New `values.yaml` section (default enabled): |
| 157 | + |
| 158 | +```yaml |
| 159 | +metrics: |
| 160 | + serviceMonitor: |
| 161 | + enabled: true |
| 162 | + interval: 30s |
| 163 | +``` |
| 164 | +
|
| 165 | +New templates (both gated on `metrics.serviceMonitor.enabled`): |
| 166 | +- `charts/imp/templates/operator/servicemonitor.yaml` — `ServiceMonitor` targeting operator port `8080` |
| 167 | +- `charts/imp/templates/agent/podmonitor.yaml` — `PodMonitor` targeting agent port `9090` |
| 168 | + |
| 169 | +Agent DaemonSet and Service get a new `metrics` port (`9090`). |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## 3. E2E Tests |
| 174 | + |
| 175 | +### Two-layer approach |
| 176 | + |
| 177 | +**Layer 1 — Operator E2E** (`//go:build e2e`, runs in Kind): |
| 178 | + |
| 179 | +Deploys the Helm chart into a Kind cluster. No Firecracker, no KVM needed. Tests: |
| 180 | +- CRD installation (all 6 CRDs present, schema validates) |
| 181 | +- Operator pod starts and passes `/healthz` + `/readyz` |
| 182 | +- Webhook accepts valid ImpVM, rejects invalid (missing classRef) |
| 183 | +- ImpVM CRUD: create → reconciler processes without crash, delete → finalizer runs |
| 184 | +- Metrics endpoint responds `200 OK` with `imp_vm_` prefixed metrics |
| 185 | + |
| 186 | +**Layer 2 — Full E2E** (`//go:build e2e_full`, self-hosted KVM runner): |
| 187 | + |
| 188 | +Requires a node with `/dev/kvm`. Tests: |
| 189 | +- Full VM boot: ImpVM reaches `Running` state within 60s |
| 190 | +- Guest agent connects: `Exec(["echo", "hello"])` returns exit 0 |
| 191 | +- Probes pass: startup probe succeeds, `Ready` condition set |
| 192 | +- Metrics contain real values: `imp_vm_guest_cpu_usage_ratio > 0` |
| 193 | +- Teardown: VM deleted → TAP interface removed, bridge cleaned up |
| 194 | + |
| 195 | +### File layout |
| 196 | + |
| 197 | +``` |
| 198 | +test/ |
| 199 | +├── e2e/ |
| 200 | +│ ├── e2e_suite_test.go # BeforeSuite: kind create + helm install |
| 201 | +│ ├── e2e_test.go # Layer 1 tests (replaces kubebuilder stub) |
| 202 | +│ └── utils/ # shared helpers (already exists) |
| 203 | +└── e2e_full/ |
| 204 | + ├── suite_test.go # BeforeSuite: verify /dev/kvm, deploy agent |
| 205 | + └── vm_lifecycle_test.go # Layer 2 tests |
| 206 | +``` |
| 207 | +
|
| 208 | +### CI integration |
| 209 | +
|
| 210 | +`ci.yml` already has an E2E job gated on `vars.E2E_RUNNER_LABEL`. Layer 1 runs on `ubuntu-latest` (no gate needed — add a new job). Layer 2 runs on the self-hosted runner under the existing gate. |
| 211 | +
|
| 212 | +```yaml |
| 213 | +# New job in ci.yml |
| 214 | +e2e-kind: |
| 215 | + runs-on: ubuntu-latest |
| 216 | + steps: |
| 217 | + - uses: helm/kind-action@v1 |
| 218 | + - run: go test -tags e2e ./test/e2e/ -v -timeout 10m |
| 219 | +``` |
0 commit comments