Skip to content

Commit cd7548c

Browse files
author
Mateusz
committed
fix: cap Retry-After, add connector-wide backoff, and expose Qwen OAuth retry config
Diagnose and fix the 429->503 surfacing issue in the Qwen OAuth connector. Bugs fixed: - Over-budget Retry-After (>10s) was treated as non-retryable instead of being capped to qwen_oauth_initial_rate_limit_retry_max_wait_seconds. - Backoff was only per-request retry, allowing concurrent requests to keep hitting upstream during the same rate-limit window. Changes: - Record connector-wide backoff to dampen bursty 429/503 retries. - Apply connector-wide backoff via _wait_for_initial_rate_limit_backoff before dispatch, logging remaining time at DEBUG. - Cap Retry-After values to configured max wait instead of disabling retry. - Fix stream iterator aclose() awaitability (inspect.isawaitable check). Configuration: expose existing retry knobs in config.example.yaml, default instance file, and docs: enable_qwen_oauth_initial_rate_limit_retry (bool, default true) qwen_oauth_initial_rate_limit_retry_max_wait_seconds (float, default 10.0) qwen_oauth_initial_rate_limit_retry_random_min_seconds (float, default 3.0) qwen_oauth_initial_rate_limit_retry_random_max_seconds (float, default 10.0)
1 parent 2ad59da commit cd7548c

4 files changed

Lines changed: 408 additions & 0 deletions

File tree

config/backends/backend-instances/qwen-oauth.default.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@
33
connector: qwen-oauth
44
extra:
55
enable_qwen_oauth_backend_debugging_override: true
6+
# Retry the first recoverable upstream rate-limit response once.
67
enable_qwen_oauth_initial_rate_limit_retry: true
8+
# Cap for Retry-After and randomized fallback delays.
79
qwen_oauth_initial_rate_limit_retry_max_wait_seconds: 10.0
10+
# Random delay range used when Retry-After is unavailable.
811
qwen_oauth_initial_rate_limit_retry_random_min_seconds: 3.0
912
qwen_oauth_initial_rate_limit_retry_random_max_seconds: 10.0

config/config.example.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,21 @@ dynamic_compression:
262262
diff_max_lines_per_hunk: 100
263263
diff_max_total_lines: 500
264264

265+
# Request-processing unification migration gates (all default-off)
266+
request_processing_unification:
267+
enable_core_canonical_path: false
268+
enable_canonical_features: false
269+
connector_stream_first: {}
270+
retire_legacy_dual_path: false
271+
emit_path_selection_metadata: false
272+
promotion_requirements:
273+
require_characterization_tests: true
274+
require_equivalence_tests: true
275+
max_non_stream_p95_latency_delta_pct: 10.0
276+
max_stream_ttft_delta_pct: 10.0
277+
max_memory_delta_pct: 10.0
278+
require_cleanup_checks: true
279+
265280
# Logging
266281
logging:
267282
level: "INFO" # TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL
@@ -368,6 +383,12 @@ backends:
368383

369384
# Note: qwen-oauth backend is configured via backend-instances file:
370385
# config/backends/backend-instances/qwen-oauth.default.yaml
386+
#
387+
# qwen-oauth retry tuning knobs (in that backend-instances file under `extra:`):
388+
# enable_qwen_oauth_initial_rate_limit_retry: true
389+
# qwen_oauth_initial_rate_limit_retry_max_wait_seconds: 10.0
390+
# qwen_oauth_initial_rate_limit_retry_random_min_seconds: 3.0
391+
# qwen_oauth_initial_rate_limit_retry_random_max_seconds: 10.0
371392

372393
kiro_oauth_auto:
373394
# Amazon Kiro / Q Developer OAuth connector

docs/user_guide/backends/qwen.md

Lines changed: 322 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,322 @@
1+
# Qwen Backend
2+
3+
The Qwen backend provides access to Alibaba's Qwen (Tongyi Qianwen) models through OAuth authentication. Qwen models are known for their strong performance, especially in Chinese language tasks and coding.
4+
5+
## Overview
6+
7+
Qwen (通义千问) is Alibaba's large language model series. The proxy supports the `qwen-oauth` backend, which uses OAuth authentication through the Qwen CLI for access to Qwen models.
8+
9+
## Key Features
10+
11+
- OpenAI-compatible API
12+
- OAuth authentication (no API key required)
13+
- Strong Chinese language support
14+
- Excellent coding capabilities (Qwen3-Coder models)
15+
- Competitive performance
16+
- Free tier available
17+
18+
## Configuration
19+
20+
### Prerequisites
21+
22+
The Qwen backend requires the Qwen CLI to be installed and authenticated:
23+
24+
```bash
25+
# Install Qwen CLI (if not already installed)
26+
# Follow Qwen's official installation instructions
27+
28+
# Authenticate with Qwen CLI (one-time)
29+
# This creates oauth_creds.json in your config directory
30+
qwen auth
31+
```
32+
33+
### Debugging Override Flag Required
34+
35+
To use this backend, you **must** launch the proxy with the following CLI flag:
36+
37+
```bash
38+
--enable-qwen-oauth-backend-debugging-override
39+
```
40+
41+
Without this flag, the backend is disabled and will reject all requests with a 403 Forbidden error.
42+
43+
```bash
44+
# Start the proxy
45+
python -m src.core.cli --default-backend qwen-oauth --enable-qwen-oauth-backend-debugging-override
46+
```
47+
48+
### Disclaimer: Internal Development Use Only
49+
50+
**IMPORTANT: PLEASE READ BEFORE USING THIS BACKEND**
51+
52+
This backend connector is implemented **solely** for the internal development purposes of this project. Its primary function is to enable the proper discovery, analysis, and implementation of secure, protocol-specific behaviors required for interoperability and compatibility layers.
53+
54+
**This connector is NOT intended for general usage, production deployment, or as a means to bypass intended access restrictions.**
55+
56+
By using this proxy with the Qwen OAuth backend configuration, you acknowledge and agree to the following terms, which constitute a binding arrangement between you and the authors of this project:
57+
58+
1. **Non-Affiliation**: This project is an independent open-source initiative. It is **not affiliated with, endorsed by, authorized by, or in any way officially connected to** Alibaba Cloud, the Qwen team, or any of their subsidiaries or affiliates. All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.
59+
2. **Restricted Access**: The use of the `--enable-qwen-oauth-backend-debugging-override` CLI flag is strictly reserved for the project's **developers, contributors, and maintainers**. Its sole purpose is debugging and maintaining the proxy's compatibility features.
60+
3. **Prohibited Use**: You must **not** use the debugging override flag if you do not belong to the authorized groups mentioned above.
61+
4. **No Liability**: The authors, contributors, and maintainers of this project hold **no responsibility or liability** for any consequences arising from the use of this flag or backend in violation of these rules, or for any violations of third-party Terms of Service resulting from such use.
62+
5. **User Responsibility**: You accept full responsibility for ensuring your use of this tool complies with all applicable laws and third-party agreements.
63+
6. **Compliance with Provider Terms**: Users of any backend connectors implemented in this proxy server are strictly required to respect all related Terms of Service (ToS) and other agreements with the respective backend providers. You are solely responsible for verifying that your use of this software is compatible with those agreements.
64+
7. **Indemnification**: You agree to indemnify, defend, and hold harmless the authors and contributors of this project from and against any and all claims, liabilities, damages, losses, or expenses, including legal fees and costs, arising out of or in any way connected with your access to or use of this backend or the debugging override flag.
65+
66+
**If you do not agree to these terms, do not use the Qwen OAuth backend or the debugging override flag.**
67+
68+
### Environment Variables
69+
70+
No API key is required. The backend reads OAuth credentials from the local `oauth_creds.json` file created by the Qwen CLI.
71+
72+
### CLI Arguments
73+
74+
```bash
75+
# Start proxy with Qwen as default backend
76+
python -m src.core.cli --default-backend qwen-oauth
77+
78+
# With specific model
79+
python -m src.core.cli --default-backend qwen-oauth --force-model qwen3-coder-plus
80+
```
81+
82+
### YAML Configuration
83+
84+
In this repository, `qwen-oauth` is typically configured in the backend-instances file
85+
`config/backends/backend-instances/qwen-oauth.default.yaml` (under `extra`).
86+
87+
```yaml
88+
# config.yaml
89+
backends:
90+
qwen-oauth:
91+
type: qwen-oauth
92+
extra:
93+
enable_qwen_oauth_backend_debugging_override: true
94+
enable_qwen_oauth_initial_rate_limit_retry: true
95+
qwen_oauth_initial_rate_limit_retry_max_wait_seconds: 10.0
96+
qwen_oauth_initial_rate_limit_retry_random_min_seconds: 3.0
97+
qwen_oauth_initial_rate_limit_retry_random_max_seconds: 10.0
98+
99+
default_backend: qwen-oauth
100+
```
101+
102+
The Qwen OAuth connector will hold the first recoverable upstream rate-limit response for up to 10 seconds, retry once, and only surface the error if the second attempt also fails. If the upstream sends `Retry-After`, that value is used and capped to your configured max wait; otherwise the connector waits for a random fallback delay before retrying.
103+
104+
Rate-limit retry behavior is fully configurable via backend `extra` parameters:
105+
106+
- `enable_qwen_oauth_initial_rate_limit_retry` (`bool`, default `true`): enable/disable initial retry behavior.
107+
- `qwen_oauth_initial_rate_limit_retry_max_wait_seconds` (`float`, default `10.0`): upper bound for any retry wait.
108+
- `qwen_oauth_initial_rate_limit_retry_random_min_seconds` (`float`, default `3.0`): minimum random fallback wait.
109+
- `qwen_oauth_initial_rate_limit_retry_random_max_seconds` (`float`, default `10.0`): maximum random fallback wait.
110+
111+
At runtime (DEBUG logs), the connector now also reports connector-wide backoff state using messages like:
112+
113+
- `Qwen OAuth connector-wide rate-limit backoff recorded: wait=...`
114+
- `Qwen OAuth connector-wide rate-limit backoff active: remaining=...`
115+
116+
## Available Models
117+
118+
Qwen offers several model variants:
119+
120+
- **Qwen3-Coder**: Specialized for coding tasks
121+
- **Qwen3-Coder-Plus**: Enhanced coding model with better performance
122+
- **Qwen-Turbo**: Fast general-purpose model
123+
- **Qwen-Plus**: Enhanced general-purpose model
124+
- **Qwen-Max**: Most capable general-purpose model
125+
126+
## Usage Examples
127+
128+
### Basic Chat Completion
129+
130+
```bash
131+
curl -X POST http://localhost:8000/v1/chat/completions \
132+
-H "Content-Type: application/json" \
133+
-H "Authorization: Bearer YOUR_PROXY_KEY" \
134+
-d '{
135+
"model": "qwen3-coder-plus",
136+
"messages": [
137+
{"role": "user", "content": "Hello!"}
138+
]
139+
}'
140+
```
141+
142+
### Coding Task
143+
144+
```bash
145+
curl -X POST http://localhost:8000/v1/chat/completions \
146+
-H "Content-Type: application/json" \
147+
-H "Authorization: Bearer YOUR_PROXY_KEY" \
148+
-d '{
149+
"model": "qwen3-coder-plus",
150+
"messages": [
151+
{"role": "user", "content": "Write a Python function to implement binary search"}
152+
]
153+
}'
154+
```
155+
156+
### Streaming Response
157+
158+
```bash
159+
curl -X POST http://localhost:8000/v1/chat/completions \
160+
-H "Content-Type: application/json" \
161+
-H "Authorization: Bearer YOUR_PROXY_KEY" \
162+
-d '{
163+
"model": "qwen3-coder-plus",
164+
"messages": [
165+
{"role": "user", "content": "Explain recursion with examples"}
166+
],
167+
"stream": true
168+
}'
169+
```
170+
171+
### Chinese Language Task
172+
173+
```bash
174+
curl -X POST http://localhost:8000/v1/chat/completions \
175+
-H "Content-Type: application/json" \
176+
-H "Authorization: Bearer YOUR_PROXY_KEY" \
177+
-d '{
178+
"model": "qwen-plus",
179+
"messages": [
180+
{"role": "user", "content": "请解释一下Python的装饰器"}
181+
]
182+
}'
183+
```
184+
185+
## Use Cases
186+
187+
### Coding Workflows
188+
189+
Qwen3-Coder models excel at:
190+
191+
- Code generation in multiple languages
192+
- Code completion and suggestions
193+
- Code review and refactoring
194+
- Debugging assistance
195+
- Technical documentation
196+
197+
### Chinese Language Applications
198+
199+
Qwen models are excellent for:
200+
201+
- Chinese language understanding and generation
202+
- Chinese-English translation
203+
- Chinese text analysis
204+
- Chinese content creation
205+
206+
### Cost-Effective Development
207+
208+
Use Qwen for:
209+
210+
- Free tier development and testing
211+
- Cost-effective alternative to Western providers
212+
- High-quality coding assistance
213+
- Bilingual applications
214+
215+
## OAuth Token Management
216+
217+
The proxy automatically manages OAuth tokens:
218+
219+
- Reads credentials from `oauth_creds.json`
220+
- Handles token refresh automatically
221+
- No manual token management required
222+
223+
If you encounter authentication issues, re-authenticate with the Qwen CLI:
224+
225+
```bash
226+
qwen auth
227+
```
228+
229+
## Model Parameters
230+
231+
You can specify model parameters using URI syntax:
232+
233+
```bash
234+
# With temperature
235+
curl -X POST http://localhost:8000/v1/chat/completions \
236+
-H "Content-Type: application/json" \
237+
-d '{
238+
"model": "qwen-oauth:qwen3-coder-plus?temperature=0.7",
239+
"messages": [{"role": "user", "content": "Hello"}]
240+
}'
241+
```
242+
243+
See [URI Model Parameters](../features/uri-model-parameters.md) for more details.
244+
245+
## Troubleshooting
246+
247+
### 401 Unauthorized
248+
249+
- Verify you've authenticated with Qwen CLI: `qwen auth`
250+
- Check that `oauth_creds.json` exists in the expected location
251+
- Try re-authenticating if the token has expired
252+
253+
### OAuth Token Expired
254+
255+
```bash
256+
# Re-authenticate with Qwen CLI
257+
qwen auth
258+
259+
# Restart the proxy
260+
python -m src.core.cli --default-backend qwen-oauth
261+
```
262+
263+
### Model Not Found
264+
265+
- Verify the model name is correct (e.g., `qwen3-coder-plus`)
266+
- Check that your account has access to the requested model
267+
- Some models may require special access or higher account tiers
268+
269+
### Rate Limiting
270+
271+
- Free tier accounts have rate limits
272+
- Consider upgrading for higher limits
273+
- Use failover to switch to alternative models
274+
275+
### Chinese Character Encoding Issues
276+
277+
- Ensure your client is using UTF-8 encoding
278+
- Check that the proxy is configured to handle UTF-8
279+
- Verify that your terminal/client supports Chinese characters
280+
281+
## Integration with Coding Agents
282+
283+
Qwen works seamlessly with coding agents:
284+
285+
```bash
286+
# Point your coding agent to the proxy
287+
export OPENAI_API_BASE=http://localhost:8000/v1
288+
export OPENAI_API_KEY=YOUR_PROXY_KEY
289+
290+
# Start the proxy with Qwen
291+
python -m src.core.cli --default-backend qwen-oauth
292+
293+
# Your coding agent will now use Qwen models
294+
```
295+
296+
## Hybrid Backend with Qwen
297+
298+
Qwen models work well in hybrid configurations. A tested combination:
299+
300+
```bash
301+
# Use MiniMax for reasoning, Qwen for execution
302+
curl -X POST http://localhost:8000/v1/chat/completions \
303+
-H "Content-Type: application/json" \
304+
-d '{
305+
"model": "hybrid:[minimax:MiniMax-M2,qwen-oauth:qwen3-coder-plus]",
306+
"messages": [{"role": "user", "content": "Complex coding task"}]
307+
}'
308+
```
309+
310+
See [Hybrid Backend](../features/hybrid-backend.md) for more details.
311+
312+
## Related Features
313+
314+
- [Model Name Rewrites](../features/model-name-rewrites.md) - Route models to Qwen
315+
- [Hybrid Backend](../features/hybrid-backend.md) - Combine Qwen with other models
316+
- [Edit Precision Tuning](../features/edit-precision.md) - Optimize for coding tasks
317+
318+
## Related Documentation
319+
320+
- [Backend Overview](overview.md)
321+
- [ZAI Backend](zai.md)
322+
- [OpenRouter Backend](openrouter.md)

0 commit comments

Comments
 (0)