UN-3344 [FIX] Activate litellm retry for all LLM providers#1867
UN-3344 [FIX] Activate litellm retry for all LLM providers#1867chandrasekharan-zipstack wants to merge 3 commits intomainfrom
Conversation
litellm's wrapper-level retry (completion_with_retries) works for all providers including httpx-based ones (Anthropic, Vertex, Bedrock, Mistral, Azure AI Foundry), but only activates when num_retries is set in kwargs. Our adapters pass max_retries (from user UI config) which only works for SDK-based providers (OpenAI, Azure). httpx-based providers silently ignored it, resulting in zero retries on transient errors (500, 502, 503). Bridge the gap by copying the user's max_retries value into num_retries and setting retry_strategy to exponential_backoff_retry before calling litellm.completion(). litellm internally zeroes max_retries during wrapper retries to prevent double-retry with SDK providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughAdded a static method Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| unstract/sdk1/src/unstract/sdk1/llm.py | New _set_litellm_retry_params static method bridges max_retries → num_retries + retry_strategy for all providers; called consistently in complete(), stream_complete(), and acomplete(). Logic is clean and correct. |
Sequence Diagram
sequenceDiagram
participant Caller
participant LLM
participant _set_litellm_retry_params
participant litellm
Caller->>LLM: complete(prompt, **kwargs)
LLM->>LLM: adapter.validate({**self.kwargs, **kwargs})
LLM->>_set_litellm_retry_params: completion_kwargs (with max_retries=N)
_set_litellm_retry_params->>_set_litellm_retry_params: isinstance(N, int) and N > 0?
_set_litellm_retry_params-->>LLM: num_retries=N, max_retries=0, retry_strategy=exponential_backoff_retry
LLM->>litellm: completion(**completion_kwargs)
Note over litellm: Wrapper-level retry loop (num_retries=N)<br/>Exponential backoff: 1s->2s->4s->8s->10s cap
loop On transient error (up to N times)
litellm->>litellm: retry with backoff
end
litellm-->>LLM: response
LLM-->>Caller: {response: LLMResponseCompat, ...}
Reviews (3): Last reviewed commit: "UN-3344 [FIX] Honor explicit max_retries..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 473-476: The current truthy check for max_retries skips explicit
zero and permits negatives; change the branch to test "max_retries is not None",
validate that max_retries is an integer and >= 0 (or raise a ValueError for
invalid values), then set completion_kwargs["num_retries"] = max_retries and
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this
logic around the max_retries handling (completion_kwargs, max_retries,
num_retries, retry_strategy) so zero is honored and negatives/non-integers are
rejected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 07e1c53a-ce05-4ead-9621-c256d668fe67
📒 Files selected for processing (1)
unstract/sdk1/src/unstract/sdk1/llm.py
| max_retries = completion_kwargs.get("max_retries") | ||
| if max_retries: | ||
| completion_kwargs["num_retries"] = max_retries | ||
| completion_kwargs["retry_strategy"] = "exponential_backoff_retry" |
There was a problem hiding this comment.
Handle zero and invalid retry counts explicitly.
At Line 474, using a truthy check skips an explicit max_retries=0 and allows negative values through. Please branch on is not None and validate bounds before copying.
Suggested patch
max_retries = completion_kwargs.get("max_retries")
- if max_retries:
- completion_kwargs["num_retries"] = max_retries
- completion_kwargs["retry_strategy"] = "exponential_backoff_retry"
+ if max_retries is None:
+ return
+ if not isinstance(max_retries, int) or max_retries < 0:
+ raise SdkError("Invalid max_retries: expected a non-negative integer")
+ completion_kwargs["num_retries"] = max_retries
+ if max_retries > 0:
+ completion_kwargs["retry_strategy"] = "exponential_backoff_retry"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 473 - 476, The current
truthy check for max_retries skips explicit zero and permits negatives; change
the branch to test "max_retries is not None", validate that max_retries is an
integer and >= 0 (or raise a ValueError for invalid values), then set
completion_kwargs["num_retries"] = max_retries and
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this
logic around the max_retries handling (completion_kwargs, max_retries,
num_retries, retry_strategy) so zero is honored and negatives/non-integers are
rejected.
SDK-based providers (OpenAI, Azure) default to max_retries=2 internally even when not explicitly set. Without zeroing it, the first attempt exhausts SDK retries before the wrapper retry kicks in, multiplying total attempts (e.g. 5 SDK + 5 wrapper = 11 instead of expected 5). Setting max_retries=0 ensures all retries go through litellm's wrapper uniformly across all providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
unstract/sdk1/src/unstract/sdk1/llm.py (1)
474-478:⚠️ Potential issue | 🟡 MinorHandle retry bounds explicitly instead of truthy-checking.
On Line 475, truthy branching is brittle for retry config. Please branch on
is Noneand validate non-negative integer input before setting retry fields.Suggested patch
max_retries = completion_kwargs.get("max_retries") - if max_retries: - completion_kwargs["num_retries"] = max_retries - completion_kwargs["max_retries"] = 0 - completion_kwargs["retry_strategy"] = "exponential_backoff_retry" + if max_retries is None: + return + if not isinstance(max_retries, int) or max_retries < 0: + raise SdkError("Invalid max_retries: expected a non-negative integer") + completion_kwargs["num_retries"] = max_retries + completion_kwargs["max_retries"] = 0 + if max_retries > 0: + completion_kwargs["retry_strategy"] = "exponential_backoff_retry"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 474 - 478, The current truthy check for max_retries is brittle; in unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs and max_retries and change the branching to check "if max_retries is not None" then validate that max_retries is an integer and >= 0 (raise a ValueError if not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries, completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; ensure invalid types or negative values are rejected with a clear error message referencing max_retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 474-478: The current truthy check for max_retries is brittle; in
unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs
and max_retries and change the branching to check "if max_retries is not None"
then validate that max_retries is an integer and >= 0 (raise a ValueError if
not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries,
completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] =
"exponential_backoff_retry"; ensure invalid types or negative values are
rejected with a clear error message referencing max_retries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5d2e77e4-9b3b-4b66-b2b4-69da9eb66a62
📒 Files selected for processing (1)
unstract/sdk1/src/unstract/sdk1/llm.py
|
This can be closed if #1886 gets merged |
…ping Addresses CodeRabbit/Greptile review on PR #1867: - Replace truthy `if max_retries:` with `isinstance(int) and > 0` so an explicit max_retries=0 (opt-out) is honored and non-int / negative values don't silently slip through. - Type completion_kwargs as dict[str, object] instead of bare dict. - Emit debug log on wrapper retry activation for observability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
|
Closing this since #1886 got merged |



What
completion_with_retries) to work for all providersmax_retriestonum_retriesparameter before calling litellmretry_strategytoexponential_backoff_retryfor all LLM completion callsWhy
Users configure
max_retriesin the adapter UI (defaults 3-5), but this only works for SDK-based providers (OpenAI, Azure). For httpx-based providers (Anthropic, Vertex AI, Bedrock, Mistral, Azure AI Foundry), retries are silently ignored — zero retries on transient errors.Production incident: Anthropic API 500 error after ~9 minutes caused immediate execution failure (ID: 23194211-ac04-442f-8a04-dde0ecf06195).
litellm has a separate wrapper-level retry mechanism that works for ALL providers, but only activates when
num_retriesis set. Our code never set it.How
Modified
LLM.complete(),LLM.stream_complete(), andLLM.acomplete()in the SDK to:max_retriesfrom kwargs tonum_retriesretry_strategytoexponential_backoff_retrylitellm internally sets
max_retries=0during wrapper retries to prevent double-retry with SDK providers.Can this PR break any existing features. If yes, please list possible items. If no, please explain why.
No. This only activates litellm's existing wrapper retry mechanism with user-configured values. SDK-based providers (OpenAI, Azure) continue to use their native retry via
max_retries(litellm sets it to 0 during wrapper retries). httpx-based providers now benefit from proper retry handling instead of failing immediately on transient errors.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
The fix can be validated by checking prompt-service logs for retry-related entries when transient LLM errors occur. Previously, errors would show a single failure; with the fix, you should see multiple retry attempts before final failure. Exponential backoff is applied (1s, 2s, 4s, 8s, 10s cap).
Screenshots
Checklist
I have read and understood the Contribution Guidelines.