Skip to content

Commit 6072ec4

Browse files
author
Mateusz
committed
docs: add dynamic compression guide and simplify end-user README
1 parent 4c58c73 commit 6072ec4

10 files changed

Lines changed: 796 additions & 77 deletions

README.md

Lines changed: 60 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -32,19 +32,33 @@ It is a compatibility layer, a security layer, a traffic control plane, a debugg
3232

3333
Beyond basic forwarding, the proxy adds cross-protocol translation, tool safety, routing and failover, session-oriented features (including B2BUA-style handling), boundary-level CBOR captures, and usage tracking. Longer narratives, use-case lists, and feature tours live in the [User Guide](docs/user_guide/index.md).
3434

35-
## Resilience Behavior
35+
## Resilience & Reliability
3636

37-
Recent resilience hardening adds safer retry and failover behavior by default:
37+
The proxy includes built-in resilience features for production use:
3838

39-
- Shared async retry policy (`stamina`-backed) is used in major retry hotspots, with canonical `Retry-After` extraction.
40-
- Routing availability now includes circuit-breaker and endpoint-health gates (when health checks are enabled), so unstable instances are temporarily excluded instead of repeatedly selected.
41-
- Streaming recovery avoids retry/failover after meaningful output has already started, preventing duplicate output and tool-call corruption.
39+
- **Smart retry and failover** - Automatic recovery from transient backend failures
40+
- **Circuit breaker** - Temporarily excludes unhealthy backends to prevent repeated failures
41+
- **Streaming protection** - Avoids retry after output has started, preventing corruption
42+
- **Health monitoring** - Tracks backend availability and performance
4243

43-
`resilience.circuit_breaker` is now a first-class config block in `config.yaml` for threshold/cooldown tuning.
44+
Configure via the `resilience` section in `config.yaml` or see the [Failure Handling Guide](docs/user_guide/features/failure-handling.md).
4445

4546
## Dynamic Tool-Output Compression
4647

47-
The proxy supports strategy-based compression for `role="tool"` outputs during backend request preparation. It is disabled by default (`dynamic_compression.enabled: false`) with deterministic precedence (CLI > ENV > YAML > defaults) and is configured via `dynamic_compression.*`, `DYNAMIC_COMPRESSION_*`, or CLI flags (for example `--dynamic-compression-enabled`). Legacy Gemini connector truncation controls (`GEMINI_TOOL_OUTPUT_TRUNCATE_*` and backend `tool_output_truncate_*` extras) remain deprecated, but still run through a compatibility path when request-path compaction/dynamic compression is inactive; when request-path reduction is active they are accepted and marked inactive with diagnostics so double reduction is avoided. Legacy session pytest compression toggles remain compatibility-only and emit deprecation warnings. Built-in strategies now include ANSI normalization, dedupe/grouping, unified-diff compaction, directory/listing summaries, search-result grouping, file-read detail/line-window reductions, failure-focused test/build reduction, diagnostics grouping (file/rule), JSON/NDJSON structural summarization, XML parseability-preserving safeguards, noisy-log dedupe with volatile-field normalization, and sensitive-field projection for env/cloud-style outputs. Dynamic compression now also supports RTK-inspired declarative rule filters via `dynamic_compression.declarative_rules` and `dynamic_compression.declarative_rule_files` (with regex guard timeouts from `dynamic_compression.declarative_regex_timeout_ms`), including an 8-stage filter pipeline and optional precedence override over code-based rules. File-detail outputs can optionally include line numbers through `dynamic_compression.file_detail_include_line_numbers` (`DYNAMIC_COMPRESSION_FILE_DETAIL_INCLUDE_LINE_NUMBERS`, `--dynamic-compression-file-detail-include-line-numbers`, `--dynamic-compression-file-detail-exclude-line-numbers`). Compression observability now includes per-output telemetry records, aggregate savings counters, rate-safe failure/fallback alerts, and effective-configuration diagnostics; optional bounded truncation-recovery handles are configured under `dynamic_compression.recovery.*`.
48+
**Documentation:** [Dynamic Tool Output Compression Guide](docs/user_guide/features/dynamic-tool-output-compression.md)
49+
50+
Intelligently compress verbose tool outputs to conserve context window space while preserving essential information for LLM reasoning.
51+
52+
**Enable:** `--dynamic-compression-enabled` or set `dynamic_compression.enabled: true` in your config file.
53+
54+
The proxy analyzes tool outputs and applies content-aware compression strategies:
55+
- **Test output reduction** - Keeps only failures from pytest/build output
56+
- **File summarization** - Reduces large file reads to structure or signatures
57+
- **Log deduplication** - Removes repetitive log lines while preserving unique events
58+
- **Search result grouping** - Organizes and limits search matches
59+
- **JSON/XML summarization** - Compresses structured data intelligently
60+
61+
Use this when working with large codebases, extensive test suites, or long-running sessions where token conservation matters.
4862

4963
## Quick Start
5064

@@ -105,8 +119,19 @@ print(response.choices[0].message.content)
105119

106120
See the full [Quick Start Guide](docs/user_guide/quick-start.md) for additional setup, auth, and backend examples.
107121

108-
### First user message appender (per session)
109-
Optional once-per-session suffix on the first `user` message (HTTP chat): `auto_append_first_prompt_filename` in config (`.txt`/`.md`), `AUTO_APPEND_FIRST_PROMPT_FILENAME`, or `--auto-append-first-prompt-filename`. File must exist at startup; contents are read once into memory (restart to reload). At default log level, startup logs confirm load; each session logs once when the suffix is merged. Applied after redaction on the outbound request only (history stays pre-transform, like redaction). Skipped for auxiliary-routed calls.
122+
### Auto-Append First Prompt
123+
124+
Automatically append text from a file to the first user message in each session. Useful for injecting context, instructions, or system prompts without modifying client code.
125+
126+
**Usage:**
127+
```yaml
128+
# config.yaml
129+
auto_append_first_prompt_filename: "./prompts/context.txt"
130+
```
131+
132+
Or via CLI: `--auto-append-first-prompt-file ./prompts/context.txt`
133+
134+
The file is loaded once at startup and appended to the first user message of every new session. See [Configuration Guide](docs/user_guide/configuration.md) for details.
110135

111136
## Supported Frontend Interfaces
112137

@@ -144,20 +169,32 @@ The backend catalog keeps growing. Current documented backends include:
144169
- [Antigravity OAuth](docs/user_guide/backends/antigravity-oauth.md)
145170
See the full [Backends Overview](docs/user_guide/backends/overview.md) for configuration and provider-specific notes.
146171

147-
## Routing Selector Semantics
148-
149-
- `backend:model` selects an explicit backend family.
150-
- `backend-instance:model` such as `openai.1:gpt-4o` targets a concrete backend instance.
151-
- `model` and `vendor/model` are model-only selectors.
152-
- `vendor/model:variant` remains model-only unless `:` appears before the first `/`.
153-
- URI-style parameters in selectors such as `model?temperature=0.5` are parsed and propagated through routing metadata.
154-
- Ordered composite failover uses `selectorA|selectorB|selectorC` and advances left-to-right when pre-output failures occur.
155-
- Weighted composite routing uses `selectorA^selectorB` with optional `[weight=N]` branch prefixes (for example `[weight=3]openai:gpt-4^anthropic:claude-3-5-sonnet`).
156-
- Composite selectors must not mix `|` and `^` in the same selector string (they are rejected during validation).
157-
- Composite failover shares one bounded attempt budget with existing retry/failover safety controls.
158-
- When providing selectors via CLI/env, quote/escape the full selector string. On Windows, `|` is a PowerShell pipeline operator and `^` is a cmd.exe escape character.
159-
- Explicit-backend configuration and command surfaces such as `--static-route`, replacement targets, and one-off routing require strict `backend:model` format.
160-
- Legacy random model replacement now routes through a compatibility bridge that emits deprecation metadata (removal timeline: `N+1`, i.e. removed in the release after the one that introduced deprecation) and rejects unsafe mappings with explicit migration errors.
172+
## Routing & Model Selection
173+
174+
The proxy uses a flexible selector syntax for routing requests to backends:
175+
176+
**Basic format:** `backend:model`
177+
```bash
178+
--default-backend openai:gpt-4o
179+
--default-backend anthropic:claude-3-5-sonnet
180+
```
181+
182+
**Failover chains:** Use `|` to specify fallback backends
183+
```bash
184+
--default-backend "openai:gpt-4o|anthropic:claude-3-5-sonnet|openrouter:openai/gpt-4o"
185+
```
186+
187+
**Weighted routing:** Use `^` to distribute traffic
188+
```bash
189+
--default-backend "[weight=3]openai:gpt-4^[weight=1]anthropic:claude-3-5-sonnet"
190+
```
191+
192+
**With parameters:** Pass model parameters in the selector
193+
```bash
194+
--default-backend "openai:gpt-4o?temperature=0.5&max_tokens=2000"
195+
```
196+
197+
See the [Technical Reference: Routing Selectors](docs/development_guide/routing-selectors.md) for detailed syntax rules and advanced usage.
161198

162199
## Access Modes
163200

@@ -178,47 +215,6 @@ python -m src.core.cli --multi-user-mode --host=0.0.0.0 --api-keys key1,key2
178215

179216
See [Access Modes](docs/user_guide/access-modes.md) for the security model and deployment guidance.
180217

181-
## Architecture
182-
183-
```mermaid
184-
graph TD
185-
subgraph "Clients"
186-
A[OpenAI Client]
187-
B[OpenAI Responses Client]
188-
C[Anthropic Client]
189-
D[Gemini Client]
190-
E[Any LLM App or Agent]
191-
end
192-
193-
subgraph "LLM Interactive Proxy"
194-
FE[Frontend APIs]
195-
Core[Routing Translation Safety Observability]
196-
BE[Backend Connectors]
197-
FE --> Core --> BE
198-
end
199-
200-
subgraph "Providers"
201-
P1[OpenAI]
202-
P2[Anthropic]
203-
P3[Gemini]
204-
P4[OpenRouter]
205-
P5[Other Backends]
206-
end
207-
208-
A --> FE
209-
B --> FE
210-
C --> FE
211-
D --> FE
212-
E --> FE
213-
BE --> P1
214-
BE --> P2
215-
BE --> P3
216-
BE --> P4
217-
BE --> P5
218-
```
219-
220-
The proxy sits between the client and the provider, which is exactly why it can translate protocols, enforce policy, capture traffic, and route requests without forcing your app to change its calling pattern.
221-
222218
## Documentation Map
223219
- **[Quick Start](docs/user_guide/quick-start.md)** - Get running fast
224220
- **[User Guide](docs/user_guide/index.md)** - End-user documentation and feature catalog

config/config.example.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ dynamic_compression:
256256
# Generic pattern and diff controls
257257
output_pattern_rules: []
258258
output_pattern_regex_timeout_ms: 25
259-
# Declarative operator-defined rules (RTK-style 8-stage text pipeline)
259+
# Declarative operator-defined rules (8-stage text pipeline)
260260
declarative_rules: []
261261
declarative_rule_files: [] # Optional extra YAML/JSON files containing declarative_rules
262262
declarative_regex_timeout_ms: 25 # Bounded regex evaluation for full-output matching stages
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Technical Reference: Routing Selectors
2+
3+
This document provides detailed technical specifications for the proxy's routing selector syntax.
4+
5+
## Selector Formats
6+
7+
### Basic Selectors
8+
9+
- `backend:model` - Selects an explicit backend family (e.g., `openai:gpt-4o`)
10+
- `backend-instance:model` - Targets a concrete backend instance (e.g., `openai.1:gpt-4o`)
11+
- `model` - Model-only selector (uses default backend)
12+
- `vendor/model` - Vendor-prefixed model selector
13+
- `vendor/model:variant` - Model variant selector (remains model-only)
14+
15+
### URI-style Parameters
16+
17+
Selectors can include URI-style parameters that are parsed and propagated through routing metadata:
18+
19+
```
20+
model?temperature=0.5&max_tokens=1000
21+
openai:gpt-4o?temperature=0.7
22+
```
23+
24+
### Composite Selectors
25+
26+
#### Failover Routing (Ordered)
27+
28+
Uses `|` to specify fallback backends when pre-output failures occur:
29+
30+
```
31+
selectorA|selectorB|selectorC
32+
```
33+
34+
Examples:
35+
- `openai:gpt-4o|anthropic:claude-3-5-sonnet|openrouter:openai/gpt-4o`
36+
- `gemini:gemini-1.5-pro|openai:gpt-4o`
37+
38+
The proxy advances left-to-right through the chain when failures occur, sharing one bounded attempt budget with existing retry/failover safety controls.
39+
40+
#### Weighted Routing
41+
42+
Uses `^` with optional `[weight=N]` prefixes for weighted distribution:
43+
44+
```
45+
[weight=3]openai:gpt-4^[weight=1]anthropic:claude-3-5-sonnet
46+
```
47+
48+
This distributes traffic 75% to OpenAI and 25% to Anthropic.
49+
50+
### Selector Rules
51+
52+
1. **No mixing operators** - Composite selectors must not mix `|` and `^` in the same selector string. These are rejected during validation.
53+
54+
2. **Quoting and escaping** - When providing selectors via CLI or environment variables, quote/escape the full selector string:
55+
- Windows: `|` is a PowerShell pipeline operator, `^` is a cmd.exe escape character
56+
- Bash: Use quotes to prevent shell interpretation
57+
58+
3. **Strict format for explicit routing** - `--static-route`, replacement targets, and explicit routing require strict `backend:model` format.
59+
60+
## Examples
61+
62+
### CLI Usage
63+
64+
```bash
65+
# Basic backend selection
66+
python -m src.core.cli --default-backend openai:gpt-4o
67+
68+
# With parameters
69+
python -m src.core.cli --default-backend "openai:gpt-4o?temperature=0.5"
70+
71+
# Failover chain
72+
python -m src.core.cli --default-backend "openai:gpt-4o|anthropic:claude-3-5-sonnet"
73+
74+
# Weighted routing
75+
python -m src.core.cli --default-backend "[weight=3]openai:gpt-4^[weight=1]anthropic:claude-3-5-sonnet"
76+
```
77+
78+
### Environment Variables
79+
80+
```bash
81+
# Default backend with failover
82+
export LLM_BACKEND="openai:gpt-4o|anthropic:claude-3-5-sonnet"
83+
84+
# Static route with parameters
85+
export STATIC_ROUTE="openai:gpt-4o?temperature=0.1"
86+
```
87+
88+
### Configuration File
89+
90+
```yaml
91+
backends:
92+
default_backend: "openai:gpt-4o"
93+
94+
# Failover route for specific model
95+
failover_routes:
96+
"gpt-4o":
97+
policy: "round-robin"
98+
elements: ["openai:gpt-4o", "anthropic:claude-3-5-sonnet"]
99+
```
100+
101+
## Legacy Compatibility
102+
103+
Random model replacement has been deprecated and now routes through a compatibility bridge:
104+
- Emits deprecation metadata
105+
- Removal timeline: N+1 (removed in release after deprecation)
106+
- Rejects unsafe mappings with explicit migration errors
107+
108+
Migrate to composite selectors for similar functionality.
109+
110+
## See Also
111+
112+
- [Backends Overview](../user_guide/backends/overview.md) - Provider setup and configuration
113+
- [Routing Configuration](../user_guide/configuration.md) - Full configuration reference
114+
- [CLI Parameters](../user_guide/cli-parameters.md) - Command-line options

docs/user_guide/cli-parameters.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,8 @@ When no CLI override applies and `notifications.enabled` is omitted in YAML, the
225225

226226
### Dynamic Tool Output Compression
227227

228+
See the [Dynamic Tool Output Compression Guide](features/dynamic-tool-output-compression.md) for detailed documentation, use cases, and examples.
229+
228230
| CLI Argument | Environment Variable | Description |
229231
| :--- | :--- | :--- |
230232
| `--dynamic-compression-enabled` | N/A | Enable dynamic tool-output compression in request preparation. |

docs/user_guide/configuration.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -577,6 +577,88 @@ memory:
577577

578578
> **See Also:** [ProxyMem: Cross-Session Memory](proxymem-memory.md) for detailed documentation including commands, privacy controls, and troubleshooting.
579579

580+
### Dynamic Tool Output Compression (`dynamic_compression`)
581+
582+
Strategy-based compression for tool outputs during backend request preparation. Unlike simple truncation, this feature uses content-aware strategies to reduce token usage while preserving semantic information.
583+
584+
Runs **after** history compaction and **before** backend translation. Disabled by default.
585+
586+
```yaml
587+
dynamic_compression:
588+
enabled: false
589+
level: "conservative" # conservative | balanced | aggressive
590+
max_level: "aggressive" # Escalation ceiling
591+
min_bytes: 1024 # Skip outputs smaller than this
592+
telemetry_include_content_hashes: true
593+
594+
# Alerting on compression issues
595+
alerts:
596+
enabled: true
597+
failure_threshold: 5
598+
fallback_threshold: 8
599+
window_seconds: 300
600+
cooldown_seconds: 300
601+
602+
# Recovery artifacts for debugging
603+
recovery:
604+
mode: "never" # never | failures | always
605+
min_original_bytes: 4096
606+
min_saved_bytes: 2048
607+
max_artifact_bytes: 262144
608+
max_artifacts: 128
609+
retention_seconds: 86400
610+
storage_dir: "var/compression_recovery"
611+
hint_in_text: false
612+
613+
# Exclusions
614+
disable_categories: [] # e.g., ["search", "file_read"]
615+
disable_methods: [] # e.g., ["line_dedupe"]
616+
disable_tools: [] # e.g., ["shell"]
617+
disable_command_prefixes: [] # e.g., ["git diff --stat"]
618+
619+
# Listing/search/read tuning
620+
noise_directories: ["node_modules", ".git", "target", "__pycache__"]
621+
search_context_lines: 2
622+
search_max_matches_per_file: 8
623+
search_max_total_groups: 100
624+
search_max_line_length: 240
625+
626+
# File detail mode
627+
file_detail_mode: "auto" # auto | full | structure | signatures
628+
file_detail_fallback_mode: "full"
629+
file_detail_auto_full_max_lines: 120
630+
file_detail_auto_structure_max_lines: 280
631+
file_detail_include_line_numbers: false
632+
file_detail_max_lines: null
633+
file_detail_last_n_lines: null
634+
635+
# Pattern-based rules (8-stage pipeline)
636+
output_pattern_rules: []
637+
output_pattern_regex_timeout_ms: 25
638+
639+
# Declarative custom rules
640+
declarative_rules: []
641+
declarative_rule_files: []
642+
declarative_regex_timeout_ms: 25
643+
644+
# Diff output controls
645+
diff_max_lines_per_hunk: 100
646+
diff_max_total_lines: 500
647+
```
648+
649+
| Option | Type | Default | Description |
650+
|--------|------|---------|-------------|
651+
| `enabled` | bool | `false` | Master switch for dynamic compression |
652+
| `level` | str | `"conservative"` | Base compression level |
653+
| `max_level` | str | `"aggressive"` | Maximum level during budget pressure |
654+
| `min_bytes` | int | `1024` | Minimum output size to compress |
655+
| `alerts.enabled` | bool | `true` | Enable compression issue alerts |
656+
| `recovery.mode` | str | `"never"` | When to save original artifacts |
657+
| `file_detail_mode` | str | `"auto"` | File read detail mode |
658+
| `declarative_rules` | list | `[]` | Custom compression rules |
659+
660+
> **See Also:** [Dynamic Tool Output Compression Guide](features/dynamic-tool-output-compression.md) for detailed documentation, compression strategies, use cases, and examples.
661+
580662
### Database (`database`)
581663

582664
The proxy uses a unified database layer for storing session data, SSO tokens, and memory summaries. SQLite is the default and requires no configuration.

0 commit comments

Comments
 (0)