You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+60-64Lines changed: 60 additions & 64 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,19 +32,33 @@ It is a compatibility layer, a security layer, a traffic control plane, a debugg
32
32
33
33
Beyond basic forwarding, the proxy adds cross-protocol translation, tool safety, routing and failover, session-oriented features (including B2BUA-style handling), boundary-level CBOR captures, and usage tracking. Longer narratives, use-case lists, and feature tours live in the [User Guide](docs/user_guide/index.md).
34
34
35
-
## Resilience Behavior
35
+
## Resilience & Reliability
36
36
37
-
Recent resilience hardening adds safer retry and failover behavior by default:
37
+
The proxy includes built-in resilience features for production use:
38
38
39
-
- Shared async retry policy (`stamina`-backed) is used in major retry hotspots, with canonical `Retry-After` extraction.
40
-
- Routing availability now includes circuit-breaker and endpoint-health gates (when health checks are enabled), so unstable instances are temporarily excluded instead of repeatedly selected.
41
-
- Streaming recovery avoids retry/failover after meaningful output has already started, preventing duplicate output and tool-call corruption.
39
+
-**Smart retry and failover** - Automatic recovery from transient backend failures
-**Streaming protection** - Avoids retry after output has started, preventing corruption
42
+
-**Health monitoring** - Tracks backend availability and performance
42
43
43
-
`resilience.circuit_breaker` is now a first-class config block in `config.yaml`for threshold/cooldown tuning.
44
+
Configure via the `resilience` section in `config.yaml`or see the [Failure Handling Guide](docs/user_guide/features/failure-handling.md).
44
45
45
46
## Dynamic Tool-Output Compression
46
47
47
-
The proxy supports strategy-based compression for `role="tool"` outputs during backend request preparation. It is disabled by default (`dynamic_compression.enabled: false`) with deterministic precedence (CLI > ENV > YAML > defaults) and is configured via `dynamic_compression.*`, `DYNAMIC_COMPRESSION_*`, or CLI flags (for example `--dynamic-compression-enabled`). Legacy Gemini connector truncation controls (`GEMINI_TOOL_OUTPUT_TRUNCATE_*` and backend `tool_output_truncate_*` extras) remain deprecated, but still run through a compatibility path when request-path compaction/dynamic compression is inactive; when request-path reduction is active they are accepted and marked inactive with diagnostics so double reduction is avoided. Legacy session pytest compression toggles remain compatibility-only and emit deprecation warnings. Built-in strategies now include ANSI normalization, dedupe/grouping, unified-diff compaction, directory/listing summaries, search-result grouping, file-read detail/line-window reductions, failure-focused test/build reduction, diagnostics grouping (file/rule), JSON/NDJSON structural summarization, XML parseability-preserving safeguards, noisy-log dedupe with volatile-field normalization, and sensitive-field projection for env/cloud-style outputs. Dynamic compression now also supports RTK-inspired declarative rule filters via `dynamic_compression.declarative_rules` and `dynamic_compression.declarative_rule_files` (with regex guard timeouts from `dynamic_compression.declarative_regex_timeout_ms`), including an 8-stage filter pipeline and optional precedence override over code-based rules. File-detail outputs can optionally include line numbers through `dynamic_compression.file_detail_include_line_numbers` (`DYNAMIC_COMPRESSION_FILE_DETAIL_INCLUDE_LINE_NUMBERS`, `--dynamic-compression-file-detail-include-line-numbers`, `--dynamic-compression-file-detail-exclude-line-numbers`). Compression observability now includes per-output telemetry records, aggregate savings counters, rate-safe failure/fallback alerts, and effective-configuration diagnostics; optional bounded truncation-recovery handles are configured under `dynamic_compression.recovery.*`.
See the full [Quick Start Guide](docs/user_guide/quick-start.md) for additional setup, auth, and backend examples.
107
121
108
-
### First user message appender (per session)
109
-
Optional once-per-session suffix on the first `user` message (HTTP chat): `auto_append_first_prompt_filename` in config (`.txt`/`.md`), `AUTO_APPEND_FIRST_PROMPT_FILENAME`, or `--auto-append-first-prompt-filename`. File must exist at startup; contents are read once into memory (restart to reload). At default log level, startup logs confirm load; each session logs once when the suffix is merged. Applied after redaction on the outbound request only (history stays pre-transform, like redaction). Skipped for auxiliary-routed calls.
122
+
### Auto-Append First Prompt
123
+
124
+
Automatically append text from a file to the first user message in each session. Useful for injecting context, instructions, or system prompts without modifying client code.
Or via CLI: `--auto-append-first-prompt-file ./prompts/context.txt`
133
+
134
+
The file is loaded once at startup and appended to the first user message of every new session. See [Configuration Guide](docs/user_guide/configuration.md) for details.
110
135
111
136
## Supported Frontend Interfaces
112
137
@@ -144,20 +169,32 @@ The backend catalog keeps growing. Current documented backends include:
See the full [Backends Overview](docs/user_guide/backends/overview.md) for configuration and provider-specific notes.
146
171
147
-
## Routing Selector Semantics
148
-
149
-
-`backend:model` selects an explicit backend family.
150
-
-`backend-instance:model` such as `openai.1:gpt-4o` targets a concrete backend instance.
151
-
-`model` and `vendor/model` are model-only selectors.
152
-
-`vendor/model:variant` remains model-only unless `:` appears before the first `/`.
153
-
- URI-style parameters in selectors such as `model?temperature=0.5` are parsed and propagated through routing metadata.
154
-
- Ordered composite failover uses `selectorA|selectorB|selectorC` and advances left-to-right when pre-output failures occur.
155
-
- Weighted composite routing uses `selectorA^selectorB` with optional `[weight=N]` branch prefixes (for example `[weight=3]openai:gpt-4^anthropic:claude-3-5-sonnet`).
156
-
- Composite selectors must not mix `|` and `^` in the same selector string (they are rejected during validation).
157
-
- Composite failover shares one bounded attempt budget with existing retry/failover safety controls.
158
-
- When providing selectors via CLI/env, quote/escape the full selector string. On Windows, `|` is a PowerShell pipeline operator and `^` is a cmd.exe escape character.
159
-
- Explicit-backend configuration and command surfaces such as `--static-route`, replacement targets, and one-off routing require strict `backend:model` format.
160
-
- Legacy random model replacement now routes through a compatibility bridge that emits deprecation metadata (removal timeline: `N+1`, i.e. removed in the release after the one that introduced deprecation) and rejects unsafe mappings with explicit migration errors.
172
+
## Routing & Model Selection
173
+
174
+
The proxy uses a flexible selector syntax for routing requests to backends:
175
+
176
+
**Basic format:** `backend:model`
177
+
```bash
178
+
--default-backend openai:gpt-4o
179
+
--default-backend anthropic:claude-3-5-sonnet
180
+
```
181
+
182
+
**Failover chains:** Use `|` to specify fallback backends
See [Access Modes](docs/user_guide/access-modes.md) for the security model and deployment guidance.
180
217
181
-
## Architecture
182
-
183
-
```mermaid
184
-
graph TD
185
-
subgraph "Clients"
186
-
A[OpenAI Client]
187
-
B[OpenAI Responses Client]
188
-
C[Anthropic Client]
189
-
D[Gemini Client]
190
-
E[Any LLM App or Agent]
191
-
end
192
-
193
-
subgraph "LLM Interactive Proxy"
194
-
FE[Frontend APIs]
195
-
Core[Routing Translation Safety Observability]
196
-
BE[Backend Connectors]
197
-
FE --> Core --> BE
198
-
end
199
-
200
-
subgraph "Providers"
201
-
P1[OpenAI]
202
-
P2[Anthropic]
203
-
P3[Gemini]
204
-
P4[OpenRouter]
205
-
P5[Other Backends]
206
-
end
207
-
208
-
A --> FE
209
-
B --> FE
210
-
C --> FE
211
-
D --> FE
212
-
E --> FE
213
-
BE --> P1
214
-
BE --> P2
215
-
BE --> P3
216
-
BE --> P4
217
-
BE --> P5
218
-
```
219
-
220
-
The proxy sits between the client and the provider, which is exactly why it can translate protocols, enforce policy, capture traffic, and route requests without forcing your app to change its calling pattern.
221
-
222
218
## Documentation Map
223
219
- **[Quick Start](docs/user_guide/quick-start.md)** - Get running fast
224
220
- **[User Guide](docs/user_guide/index.md)** - End-user documentation and feature catalog
The proxy advances left-to-right through the chain when failures occur, sharing one bounded attempt budget with existing retry/failover safety controls.
39
+
40
+
#### Weighted Routing
41
+
42
+
Uses `^` with optional `[weight=N]` prefixes for weighted distribution:
Copy file name to clipboardExpand all lines: docs/user_guide/configuration.md
+82Lines changed: 82 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -577,6 +577,88 @@ memory:
577
577
578
578
> **See Also:** [ProxyMem: Cross-Session Memory](proxymem-memory.md) for detailed documentation including commands, privacy controls, and troubleshooting.
Strategy-based compression for tool outputs during backend request preparation. Unlike simple truncation, this feature uses content-aware strategies to reduce token usage while preserving semantic information.
583
+
584
+
Runs **after** history compaction and **before** backend translation. Disabled by default.
> **See Also:** [Dynamic Tool Output Compression Guide](features/dynamic-tool-output-compression.md) for detailed documentation, compression strategies, use cases, and examples.
661
+
580
662
### Database (`database`)
581
663
582
664
The proxy uses a unified database layer for storing session data, SSO tokens, and memory summaries. SQLite is the default and requires no configuration.
0 commit comments