Background
IResponsesRouteResolver.ResolveRouteValueAsync is called on every /v1/responses request to map vendor/model to a NyxID route. It in turn drives NyxIdLlmCatalogHttpClient to hit NyxID /llm/status and /proxy/services.
Catalog data is slow-changing (provider list changes hours-to-days), but we re-fetch on every chat turn.
Proposal
In-process catalog cache with stale-while-revalidate semantics:
- TTL: 60s (fresh), 600s (stale-but-usable)
- Background refresh task wakes every 30s
- On fetch failure during refresh: keep serving stale, surface metric
- Invalidate on observed 404
unknown route from NyxIdLLMProvider (route disappeared upstream)
- Bound key set by scope (catalog is global, not per-caller — single cache instance)
Constraints
- Cannot push catalog state into a GAgent — it would re-introduce an actor with no business reason (CLAUDE.md "ReadModel 按需创建"). Plain memory cache is fine.
- Must coexist with cc-switch users who configure new providers — surface a
/v1/admin/invalidate-catalog (auth-gated) or rely on the stale-window to converge
Related
Acceptance
- p50 latency on
/v1/responses drops by the catalog fetch time (~tens of ms)
- Catalog cache hit ratio observable in metrics
- Stale-fallback engaged at least once in chaos test (NyxID down)
Background
IResponsesRouteResolver.ResolveRouteValueAsyncis called on every/v1/responsesrequest to mapvendor/modelto a NyxID route. It in turn drivesNyxIdLlmCatalogHttpClientto hit NyxID/llm/statusand/proxy/services.Catalog data is slow-changing (provider list changes hours-to-days), but we re-fetch on every chat turn.
Proposal
In-process catalog cache with stale-while-revalidate semantics:
unknown routefrom NyxIdLLMProvider (route disappeared upstream)Constraints
/v1/admin/invalidate-catalog(auth-gated) or rely on the stale-window to convergeRelated
/meround-trip) — both reduce NyxID hop pressure on hot path. Same architectural concern (chatty external service), distinct data lifecycle (auth = per-caller short-lived; catalog = global slow-changing).Acceptance
/v1/responsesdrops by the catalog fetch time (~tens of ms)