|
| 1 | +# Task 10.4 — Live-through-proxy verification (operator playbook) |
| 2 | + |
| 3 | +Use this when an environment has **real provider credentials** and a running proxy. Do not treat mocked `TestClient` / patched `RequestProcessor` integration tests as completion of task 10.4. |
| 4 | + |
| 5 | +## Conversation linkage and horizontal scaling (Requirement 3.3) |
| 6 | + |
| 7 | +The client-facing `/v1/responses` frontend resolves `previous_response_id` using the in-process [`InMemoryResponsesSessionStore`](../../../src/core/services/in_memory_responses_session_store.py) (TTL-backed). That matches the spec’s **phase 1** design: follow-up linkage is **reliable within a single proxy process**. |
| 8 | + |
| 9 | +If you run **multiple workers or processes** behind a load balancer without sticky routing, a valid `previous_response_id` may hit a different worker than the one that stored the prior response and the client will see `previous_response_id` / “not found” style errors even though the id was valid on another instance. Supported operating models until a shared store exists: |
| 10 | + |
| 11 | +- **Single worker / single process** (simplest), or |
| 12 | +- **Sticky sessions** so follow-up requests for the same conversation land on the same worker, or |
| 13 | +- A future **shared** `IResponsesSessionStore` implementation (out of scope for the current in-memory default). |
| 14 | + |
| 15 | +Document your deployment choice so operators do not assume cross-worker linkage with the default store. |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +1. **Python**: project venv — `./.venv/Scripts/python.exe` (Windows) or `./.venv/bin/python` (Unix). |
| 20 | +2. **Config**: `config/config.yaml` from `config/config.example.yaml`; set `auth.disable_auth: true` for local runs *or* configure proxy API keys and send `Authorization: Bearer <key>` on every call. |
| 21 | +3. **Credentials** (export before start, values not committed): |
| 22 | + - Native OpenAI Responses path: `OPENAI_API_KEY` |
| 23 | + - Translated path (pick one): `ANTHROPIC_API_KEY` and/or `GEMINI_API_KEY` (or `GOOGLE_API_KEY` per your Gemini setup) |
| 24 | +4. **Backends in YAML**: ensure `openai` or `openai-responses`, plus `anthropic` and/or `gemini`, are enabled with keys resolved from env (see `docs/user_guide/backends/overview.md`). |
| 25 | + |
| 26 | +## Minimal config snippet |
| 27 | + |
| 28 | +Start from `config/config.example.yaml` and make sure the active config contains at least the following shape: |
| 29 | + |
| 30 | +```yaml |
| 31 | +host: "127.0.0.1" |
| 32 | +port: 8000 |
| 33 | + |
| 34 | +auth: |
| 35 | + disable_auth: true |
| 36 | + |
| 37 | +backends: |
| 38 | + default_backend: openai-responses |
| 39 | + openai: |
| 40 | + type: openai |
| 41 | + openai-responses: |
| 42 | + type: openai-responses |
| 43 | + anthropic: |
| 44 | + type: anthropic |
| 45 | + gemini: |
| 46 | + type: gemini |
| 47 | + |
| 48 | +responses_api: |
| 49 | + websocket: |
| 50 | + frontend_enabled: true |
| 51 | + backend_enabled: true |
| 52 | + frontend_path: "/v1/responses" |
| 53 | + connection_timeout: 3600 |
| 54 | + max_connections: 100 |
| 55 | +``` |
| 56 | +
|
| 57 | +Notes: |
| 58 | +
|
| 59 | +- `responses_api.websocket.*` defaults live in `config/config.example.yaml` and are disabled by default there. |
| 60 | +- Keep model selectors explicit during verification, even if `default_backend` is set. |
| 61 | +- If you want to verify HTTP only, `frontend_enabled` / `backend_enabled` can stay `false`, but leave them `true` if you also plan to run the WebSocket checks in the same session. |
| 62 | + |
| 63 | +## Start the proxy |
| 64 | + |
| 65 | +From repo root: |
| 66 | + |
| 67 | +```powershell |
| 68 | +Set-Location C:\Users\Mateusz\source\repos\llm-interactive-proxy |
| 69 | +$env:OPENAI_API_KEY = "<secret>" |
| 70 | +$env:ANTHROPIC_API_KEY = "<secret>" # if exercising Anthropic translated path |
| 71 | +$env:GEMINI_API_KEY = "<secret>" # if exercising Gemini translated path |
| 72 | +./.venv/Scripts/python.exe -m src.core.cli --config config/config.yaml |
| 73 | +``` |
| 74 | + |
| 75 | +Use `--default-backend` if you rely on unprefixed models (e.g. `--default-backend openai-responses`). Prefer **explicit** `backend:model` selectors below so routing is unambiguous. |
| 76 | + |
| 77 | +Base URL in examples: `http://127.0.0.1:8000` and `ws://127.0.0.1:8000` (match `host` / `port` in config). |
| 78 | + |
| 79 | +## Quick preflight checks |
| 80 | + |
| 81 | +In a second terminal, confirm the process is reachable before running the verification matrix: |
| 82 | + |
| 83 | +```powershell |
| 84 | +curl.exe -i http://127.0.0.1:8000/health |
| 85 | +curl.exe -i http://127.0.0.1:8000/v1/models |
| 86 | +``` |
| 87 | + |
| 88 | +Expected: |
| 89 | + |
| 90 | +- `/health` returns `200` |
| 91 | +- `/v1/models` returns `200` and includes at least the backend families you intend to test |
| 92 | + |
| 93 | +## Recommended execution matrix |
| 94 | + |
| 95 | +Run these in order so evidence is easy to interpret: |
| 96 | + |
| 97 | +1. Native Responses path, HTTP non-streaming |
| 98 | +2. Native Responses path, HTTP streaming SSE |
| 99 | +3. Translated path, HTTP non-streaming |
| 100 | +4. Translated path, HTTP streaming SSE |
| 101 | +5. Optional WebSocket path on one native backend and one translated backend if enabled |
| 102 | +6. Optional follow-up request using `previous_response_id` against at least one backend flavor |
| 103 | + |
| 104 | +## A) Native Responses backend path (HTTP) |
| 105 | + |
| 106 | +**Selector (example):** `openai-responses:gpt-4o-mini` |
| 107 | +(Alternative: `openai:gpt-4o-mini` — controller still uses `OpenAIResponsesProjector` for `openai` / `openai-responses` backends; `openai-responses` targets provider `/v1/responses` per backend docs.) |
| 108 | + |
| 109 | +### A1. Non-streaming smoke check |
| 110 | + |
| 111 | +```powershell |
| 112 | +curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 113 | + -H "Content-Type: application/json" ^ |
| 114 | + -d "{\"model\":\"openai-responses:gpt-4o-mini\",\"input\":\"Say hello in one short sentence.\"}" |
| 115 | +``` |
| 116 | + |
| 117 | +**Assertions** |
| 118 | + |
| 119 | +- HTTP `200` |
| 120 | +- JSON body has top-level `object: "response"` |
| 121 | +- JSON body has non-empty `id` |
| 122 | +- JSON body has `output` and/or equivalent completed response content expected by the frontend contract |
| 123 | + |
| 124 | +### A2. Streaming SSE check |
| 125 | + |
| 126 | +```bash |
| 127 | +curl.exe -sN -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 128 | + -H "Content-Type: application/json" ^ |
| 129 | + -d "{\"model\":\"openai-responses:gpt-4o-mini\",\"input\":\"Say hello in one short sentence.\",\"stream\":true}" |
| 130 | +``` |
| 131 | + |
| 132 | +On Unix shells, use a single line or single-quoted JSON. |
| 133 | + |
| 134 | +**Assertions** |
| 135 | + |
| 136 | +- HTTP `200`, `Content-Type` contains `text/event-stream`. |
| 137 | +- Parsed SSE `data:` lines (JSON) include official Responses event types in order: e.g. `response.created`, progress/output events, then terminal `response.completed` (or `response.failed` / `response.incomplete` on error paths). |
| 138 | +- Final SSE sentinel line `data: [DONE]` after the terminal event (proxy contract; see pinned fixtures under `tests/integration/fixtures/responses_api_frontend/`). |
| 139 | +- Response `id` and `output` items are non-empty on success; optional: capture `response.id` for a follow-up `previous_response_id` test. |
| 140 | + |
| 141 | +### A3. Follow-up continuity check |
| 142 | + |
| 143 | +Reuse the `response.id` from A1 or the terminal event from A2: |
| 144 | + |
| 145 | +```powershell |
| 146 | +curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 147 | + -H "Content-Type: application/json" ^ |
| 148 | + -d "{\"model\":\"openai-responses:gpt-4o-mini\",\"input\":\"Continue in one more sentence.\",\"previous_response_id\":\"<response-id-from-A1-or-A2>\"}" |
| 149 | +``` |
| 150 | + |
| 151 | +**Assertions** |
| 152 | + |
| 153 | +- HTTP `200` |
| 154 | +- No `previous_response_not_found` error |
| 155 | +- Result is semantically a continuation rather than a fresh unrelated answer |
| 156 | + |
| 157 | +## B) Translated backend-flavor path (HTTP) |
| 158 | + |
| 159 | +**Selector (Anthropic example):** `anthropic:claude-3-5-haiku-20241022` |
| 160 | +**Selector (Gemini example):** `gemini:gemini-2.0-flash` (adjust to a model id present in your catalog) |
| 161 | + |
| 162 | +### B1. Anthropic translated path |
| 163 | + |
| 164 | +```bash |
| 165 | +curl.exe -sN -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 166 | + -H "Content-Type: application/json" ^ |
| 167 | + -d "{\"model\":\"anthropic:claude-3-5-haiku-20241022\",\"input\":\"Say hello in one short sentence.\",\"stream\":true}" |
| 168 | +``` |
| 169 | + |
| 170 | +### B2. Gemini translated path |
| 171 | + |
| 172 | +```bash |
| 173 | +curl.exe -sN -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 174 | + -H "Content-Type: application/json" ^ |
| 175 | + -d "{\"model\":\"gemini:gemini-2.0-flash\",\"input\":\"Say hello in one short sentence.\",\"stream\":true}" |
| 176 | +``` |
| 177 | + |
| 178 | +### B3. Explicit limitation check on a translated path |
| 179 | + |
| 180 | +Use a feature that current tests pin as unsupported for Anthropic or Gemini: |
| 181 | + |
| 182 | +```powershell |
| 183 | +curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 184 | + -H "Content-Type: application/json" ^ |
| 185 | + -d "{\"model\":\"anthropic:claude-3-5-haiku-20241022\",\"input\":\"Hello\",\"include\":[\"reasoning.encrypted_content\"]}" |
| 186 | +``` |
| 187 | + |
| 188 | +**Assertions** |
| 189 | + |
| 190 | +- HTTP `400` |
| 191 | +- Error body includes `code: "provider_limitation"` |
| 192 | +- Error body is explicit and client-visible; there is no silent success with dropped semantics |
| 193 | +
|
| 194 | +**Streaming assertions for B1/B2** |
| 195 | +
|
| 196 | +- Same transport expectations as (A): `200`, SSE, terminal `response.completed`, `[DONE]`. |
| 197 | +- No silent downgrade: if the request uses a feature the projector cannot support, expect a **client-visible** error with `provider_limitation` semantics (HTTP non-2xx and error body per `ResponsesProtocolError` mapping), not a truncated success stream. |
| 198 | + |
| 199 | +## C) Optional legacy OpenAI-style selector cross-check |
| 200 | + |
| 201 | +This is useful if you want to prove the `openai:` route now honors native projected Responses payloads through the same frontend contract. |
| 202 | + |
| 203 | +```powershell |
| 204 | +curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^ |
| 205 | + -H "Content-Type: application/json" ^ |
| 206 | + -d "{\"model\":\"openai:gpt-4o-mini\",\"input\":[{\"type\":\"message\",\"role\":\"user\",\"content\":[{\"type\":\"input_text\",\"text\":\"Say hello in one short sentence.\"}]}],\"stream\":true}" |
| 207 | +``` |
| 208 | + |
| 209 | +**Assertions** |
| 210 | + |
| 211 | +- Frontend behavior matches the same client-facing Responses contract used by `openai-responses:*` |
| 212 | +- No lossy failure caused by implicit fallback to `/chat/completions` |
| 213 | +- SSE still terminates with canonical terminal event plus `[DONE]` |
| 214 | + |
| 215 | +## D) WebSocket (optional, same frontend path) |
| 216 | + |
| 217 | +1. In `config/config.yaml`, set `responses_api.websocket.frontend_enabled: true` and `responses_api.websocket.backend_enabled: true`. |
| 218 | +2. If using the native OpenAI WebSocket path, ensure the backend path is `openai-responses:*` and the OpenAI backend is configured for websocket use as documented in `docs/user_guide/features/websocket-transport.md`. |
| 219 | +3. Start the proxy with those settings and connect to `ws://127.0.0.1:8000/v1/responses`. |
| 220 | +4. Use the built-in demo script for a copy-paste run: |
| 221 | + |
| 222 | +```powershell |
| 223 | +./.venv/Scripts/python.exe scripts/demo_responses_websocket.py --mode proxy --proxy-url ws://127.0.0.1:8000/v1/responses --turns 2 |
| 224 | +``` |
| 225 | + |
| 226 | +5. Or send a raw `response.create` message using your preferred client: |
| 227 | + |
| 228 | +```json |
| 229 | +{ |
| 230 | + "type": "response.create", |
| 231 | + "model": "openai-responses:gpt-4o-mini", |
| 232 | + "input": "Say hello in one short sentence.", |
| 233 | + "max_output_tokens": 100 |
| 234 | +} |
| 235 | +``` |
| 236 | + |
| 237 | +6. **Assertions:** receive JSON events until terminal event, preserve monotonic `sequence_number`, and confirm multi-turn continuity works when the second request sends `previous_response_id` from the first result. |
| 238 | + |
| 239 | +## Evidence to attach when closing task 10.4 |
| 240 | + |
| 241 | +- Redacted transcript: HTTP status, model selector, first/last SSE event types, confirmation of `[DONE]`, and backend flavor used. |
| 242 | +- For WebSocket: list of received `type` fields through terminal event. |
| 243 | +- Note proxy version / git SHA and config flags (`frontend_enabled`, backend types). |
| 244 | +- Record whether the translated path used Anthropic, Gemini, or the legacy `openai:` selector. |
| 245 | + |
| 246 | +## Fixture cross-check |
| 247 | + |
| 248 | +After a live run, optionally diff normalized event shapes against: |
| 249 | + |
| 250 | +- `tests/integration/fixtures/responses_api_frontend/http_streaming_sse.json` |
| 251 | +- `tests/integration/fixtures/responses_api_frontend/websocket_streaming.json` |
| 252 | + |
| 253 | +Material differences require updating implementation, tests, or spec tasks — not silent edits to fixtures without investigation. |
0 commit comments