Skip to content

Commit 39eb996

Browse files
author
Mateusz
committed
feat(responses-api): domain stack, projectors, Responses controller overhaul
- Add Responses domain models, request/event normalizers, semantic events, wire renderer, and session resolution helpers. - Introduce vendor responses projectors (OpenAI, Anthropic, Gemini), session store interface, and in-memory implementation. - Refactor responses_controller and streaming SSE serializer; extend FastAPI exception adapters and shared exceptions. - Update model catalog / backend preparer, application builder, startup stages, and connector surfaces (OpenAI, Anthropic, Gemini, Codex-related paths, opencode_go). - Add integration fixtures, phase 8 contract tests, websocket integration test (drop .broken), frontend regression, and broad unit coverage. - Document /v1/responses routed backends in README; add Kiro compliance specs, operator verification note, and internal Codex compatibility plan. Made-with: Cursor
1 parent a97c769 commit 39eb996

88 files changed

Lines changed: 15241 additions & 4489 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
# Task 10.4 — Live-through-proxy verification (operator playbook)
2+
3+
Use this when an environment has **real provider credentials** and a running proxy. Do not treat mocked `TestClient` / patched `RequestProcessor` integration tests as completion of task 10.4.
4+
5+
## Conversation linkage and horizontal scaling (Requirement 3.3)
6+
7+
The client-facing `/v1/responses` frontend resolves `previous_response_id` using the in-process [`InMemoryResponsesSessionStore`](../../../src/core/services/in_memory_responses_session_store.py) (TTL-backed). That matches the spec’s **phase 1** design: follow-up linkage is **reliable within a single proxy process**.
8+
9+
If you run **multiple workers or processes** behind a load balancer without sticky routing, a valid `previous_response_id` may hit a different worker than the one that stored the prior response and the client will see `previous_response_id` / “not found” style errors even though the id was valid on another instance. Supported operating models until a shared store exists:
10+
11+
- **Single worker / single process** (simplest), or
12+
- **Sticky sessions** so follow-up requests for the same conversation land on the same worker, or
13+
- A future **shared** `IResponsesSessionStore` implementation (out of scope for the current in-memory default).
14+
15+
Document your deployment choice so operators do not assume cross-worker linkage with the default store.
16+
17+
## Prerequisites
18+
19+
1. **Python**: project venv — `./.venv/Scripts/python.exe` (Windows) or `./.venv/bin/python` (Unix).
20+
2. **Config**: `config/config.yaml` from `config/config.example.yaml`; set `auth.disable_auth: true` for local runs *or* configure proxy API keys and send `Authorization: Bearer <key>` on every call.
21+
3. **Credentials** (export before start, values not committed):
22+
- Native OpenAI Responses path: `OPENAI_API_KEY`
23+
- Translated path (pick one): `ANTHROPIC_API_KEY` and/or `GEMINI_API_KEY` (or `GOOGLE_API_KEY` per your Gemini setup)
24+
4. **Backends in YAML**: ensure `openai` or `openai-responses`, plus `anthropic` and/or `gemini`, are enabled with keys resolved from env (see `docs/user_guide/backends/overview.md`).
25+
26+
## Minimal config snippet
27+
28+
Start from `config/config.example.yaml` and make sure the active config contains at least the following shape:
29+
30+
```yaml
31+
host: "127.0.0.1"
32+
port: 8000
33+
34+
auth:
35+
disable_auth: true
36+
37+
backends:
38+
default_backend: openai-responses
39+
openai:
40+
type: openai
41+
openai-responses:
42+
type: openai-responses
43+
anthropic:
44+
type: anthropic
45+
gemini:
46+
type: gemini
47+
48+
responses_api:
49+
websocket:
50+
frontend_enabled: true
51+
backend_enabled: true
52+
frontend_path: "/v1/responses"
53+
connection_timeout: 3600
54+
max_connections: 100
55+
```
56+
57+
Notes:
58+
59+
- `responses_api.websocket.*` defaults live in `config/config.example.yaml` and are disabled by default there.
60+
- Keep model selectors explicit during verification, even if `default_backend` is set.
61+
- If you want to verify HTTP only, `frontend_enabled` / `backend_enabled` can stay `false`, but leave them `true` if you also plan to run the WebSocket checks in the same session.
62+
63+
## Start the proxy
64+
65+
From repo root:
66+
67+
```powershell
68+
Set-Location C:\Users\Mateusz\source\repos\llm-interactive-proxy
69+
$env:OPENAI_API_KEY = "<secret>"
70+
$env:ANTHROPIC_API_KEY = "<secret>" # if exercising Anthropic translated path
71+
$env:GEMINI_API_KEY = "<secret>" # if exercising Gemini translated path
72+
./.venv/Scripts/python.exe -m src.core.cli --config config/config.yaml
73+
```
74+
75+
Use `--default-backend` if you rely on unprefixed models (e.g. `--default-backend openai-responses`). Prefer **explicit** `backend:model` selectors below so routing is unambiguous.
76+
77+
Base URL in examples: `http://127.0.0.1:8000` and `ws://127.0.0.1:8000` (match `host` / `port` in config).
78+
79+
## Quick preflight checks
80+
81+
In a second terminal, confirm the process is reachable before running the verification matrix:
82+
83+
```powershell
84+
curl.exe -i http://127.0.0.1:8000/health
85+
curl.exe -i http://127.0.0.1:8000/v1/models
86+
```
87+
88+
Expected:
89+
90+
- `/health` returns `200`
91+
- `/v1/models` returns `200` and includes at least the backend families you intend to test
92+
93+
## Recommended execution matrix
94+
95+
Run these in order so evidence is easy to interpret:
96+
97+
1. Native Responses path, HTTP non-streaming
98+
2. Native Responses path, HTTP streaming SSE
99+
3. Translated path, HTTP non-streaming
100+
4. Translated path, HTTP streaming SSE
101+
5. Optional WebSocket path on one native backend and one translated backend if enabled
102+
6. Optional follow-up request using `previous_response_id` against at least one backend flavor
103+
104+
## A) Native Responses backend path (HTTP)
105+
106+
**Selector (example):** `openai-responses:gpt-4o-mini`
107+
(Alternative: `openai:gpt-4o-mini` — controller still uses `OpenAIResponsesProjector` for `openai` / `openai-responses` backends; `openai-responses` targets provider `/v1/responses` per backend docs.)
108+
109+
### A1. Non-streaming smoke check
110+
111+
```powershell
112+
curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^
113+
-H "Content-Type: application/json" ^
114+
-d "{\"model\":\"openai-responses:gpt-4o-mini\",\"input\":\"Say hello in one short sentence.\"}"
115+
```
116+
117+
**Assertions**
118+
119+
- HTTP `200`
120+
- JSON body has top-level `object: "response"`
121+
- JSON body has non-empty `id`
122+
- JSON body has `output` and/or equivalent completed response content expected by the frontend contract
123+
124+
### A2. Streaming SSE check
125+
126+
```bash
127+
curl.exe -sN -X POST "http://127.0.0.1:8000/v1/responses" ^
128+
-H "Content-Type: application/json" ^
129+
-d "{\"model\":\"openai-responses:gpt-4o-mini\",\"input\":\"Say hello in one short sentence.\",\"stream\":true}"
130+
```
131+
132+
On Unix shells, use a single line or single-quoted JSON.
133+
134+
**Assertions**
135+
136+
- HTTP `200`, `Content-Type` contains `text/event-stream`.
137+
- Parsed SSE `data:` lines (JSON) include official Responses event types in order: e.g. `response.created`, progress/output events, then terminal `response.completed` (or `response.failed` / `response.incomplete` on error paths).
138+
- Final SSE sentinel line `data: [DONE]` after the terminal event (proxy contract; see pinned fixtures under `tests/integration/fixtures/responses_api_frontend/`).
139+
- Response `id` and `output` items are non-empty on success; optional: capture `response.id` for a follow-up `previous_response_id` test.
140+
141+
### A3. Follow-up continuity check
142+
143+
Reuse the `response.id` from A1 or the terminal event from A2:
144+
145+
```powershell
146+
curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^
147+
-H "Content-Type: application/json" ^
148+
-d "{\"model\":\"openai-responses:gpt-4o-mini\",\"input\":\"Continue in one more sentence.\",\"previous_response_id\":\"<response-id-from-A1-or-A2>\"}"
149+
```
150+
151+
**Assertions**
152+
153+
- HTTP `200`
154+
- No `previous_response_not_found` error
155+
- Result is semantically a continuation rather than a fresh unrelated answer
156+
157+
## B) Translated backend-flavor path (HTTP)
158+
159+
**Selector (Anthropic example):** `anthropic:claude-3-5-haiku-20241022`
160+
**Selector (Gemini example):** `gemini:gemini-2.0-flash` (adjust to a model id present in your catalog)
161+
162+
### B1. Anthropic translated path
163+
164+
```bash
165+
curl.exe -sN -X POST "http://127.0.0.1:8000/v1/responses" ^
166+
-H "Content-Type: application/json" ^
167+
-d "{\"model\":\"anthropic:claude-3-5-haiku-20241022\",\"input\":\"Say hello in one short sentence.\",\"stream\":true}"
168+
```
169+
170+
### B2. Gemini translated path
171+
172+
```bash
173+
curl.exe -sN -X POST "http://127.0.0.1:8000/v1/responses" ^
174+
-H "Content-Type: application/json" ^
175+
-d "{\"model\":\"gemini:gemini-2.0-flash\",\"input\":\"Say hello in one short sentence.\",\"stream\":true}"
176+
```
177+
178+
### B3. Explicit limitation check on a translated path
179+
180+
Use a feature that current tests pin as unsupported for Anthropic or Gemini:
181+
182+
```powershell
183+
curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^
184+
-H "Content-Type: application/json" ^
185+
-d "{\"model\":\"anthropic:claude-3-5-haiku-20241022\",\"input\":\"Hello\",\"include\":[\"reasoning.encrypted_content\"]}"
186+
```
187+
188+
**Assertions**
189+
190+
- HTTP `400`
191+
- Error body includes `code: "provider_limitation"`
192+
- Error body is explicit and client-visible; there is no silent success with dropped semantics
193+
194+
**Streaming assertions for B1/B2**
195+
196+
- Same transport expectations as (A): `200`, SSE, terminal `response.completed`, `[DONE]`.
197+
- No silent downgrade: if the request uses a feature the projector cannot support, expect a **client-visible** error with `provider_limitation` semantics (HTTP non-2xx and error body per `ResponsesProtocolError` mapping), not a truncated success stream.
198+
199+
## C) Optional legacy OpenAI-style selector cross-check
200+
201+
This is useful if you want to prove the `openai:` route now honors native projected Responses payloads through the same frontend contract.
202+
203+
```powershell
204+
curl.exe -sS -X POST "http://127.0.0.1:8000/v1/responses" ^
205+
-H "Content-Type: application/json" ^
206+
-d "{\"model\":\"openai:gpt-4o-mini\",\"input\":[{\"type\":\"message\",\"role\":\"user\",\"content\":[{\"type\":\"input_text\",\"text\":\"Say hello in one short sentence.\"}]}],\"stream\":true}"
207+
```
208+
209+
**Assertions**
210+
211+
- Frontend behavior matches the same client-facing Responses contract used by `openai-responses:*`
212+
- No lossy failure caused by implicit fallback to `/chat/completions`
213+
- SSE still terminates with canonical terminal event plus `[DONE]`
214+
215+
## D) WebSocket (optional, same frontend path)
216+
217+
1. In `config/config.yaml`, set `responses_api.websocket.frontend_enabled: true` and `responses_api.websocket.backend_enabled: true`.
218+
2. If using the native OpenAI WebSocket path, ensure the backend path is `openai-responses:*` and the OpenAI backend is configured for websocket use as documented in `docs/user_guide/features/websocket-transport.md`.
219+
3. Start the proxy with those settings and connect to `ws://127.0.0.1:8000/v1/responses`.
220+
4. Use the built-in demo script for a copy-paste run:
221+
222+
```powershell
223+
./.venv/Scripts/python.exe scripts/demo_responses_websocket.py --mode proxy --proxy-url ws://127.0.0.1:8000/v1/responses --turns 2
224+
```
225+
226+
5. Or send a raw `response.create` message using your preferred client:
227+
228+
```json
229+
{
230+
"type": "response.create",
231+
"model": "openai-responses:gpt-4o-mini",
232+
"input": "Say hello in one short sentence.",
233+
"max_output_tokens": 100
234+
}
235+
```
236+
237+
6. **Assertions:** receive JSON events until terminal event, preserve monotonic `sequence_number`, and confirm multi-turn continuity works when the second request sends `previous_response_id` from the first result.
238+
239+
## Evidence to attach when closing task 10.4
240+
241+
- Redacted transcript: HTTP status, model selector, first/last SSE event types, confirmation of `[DONE]`, and backend flavor used.
242+
- For WebSocket: list of received `type` fields through terminal event.
243+
- Note proxy version / git SHA and config flags (`frontend_enabled`, backend types).
244+
- Record whether the translated path used Anthropic, Gemini, or the legacy `openai:` selector.
245+
246+
## Fixture cross-check
247+
248+
After a live run, optionally diff normalized event shapes against:
249+
250+
- `tests/integration/fixtures/responses_api_frontend/http_streaming_sse.json`
251+
- `tests/integration/fixtures/responses_api_frontend/websocket_streaming.json`
252+
253+
Material differences require updating implementation, tests, or spec tasks — not silent edits to fixtures without investigation.

0 commit comments

Comments
 (0)