UN-3344 [FIX] Activate litellm retry for all LLM providers by chandrasekharan-zipstack · Pull Request #1867 · Zipstack/unstract

chandrasekharan-zipstack · 2026-03-18T15:00:51Z

What

Bridge litellm's wrapper-level retry (completion_with_retries) to work for all providers
Copy user-configured max_retries to num_retries parameter before calling litellm
Set retry_strategy to exponential_backoff_retry for all LLM completion calls

Why

Users configure max_retries in the adapter UI (defaults 3-5), but this only works for SDK-based providers (OpenAI, Azure). For httpx-based providers (Anthropic, Vertex AI, Bedrock, Mistral, Azure AI Foundry), retries are silently ignored — zero retries on transient errors.

Production incident: Anthropic API 500 error after ~9 minutes caused immediate execution failure (ID: 23194211-ac04-442f-8a04-dde0ecf06195).

litellm has a separate wrapper-level retry mechanism that works for ALL providers, but only activates when num_retries is set. Our code never set it.

How

Modified LLM.complete(), LLM.stream_complete(), and LLM.acomplete() in the SDK to:

Copy max_retries from kwargs to num_retries
Set retry_strategy to exponential_backoff_retry
Let litellm handle wrapper-level retries with exponential backoff (1s→2s→4s→8s→10s cap)

litellm internally sets max_retries=0 during wrapper retries to prevent double-retry with SDK providers.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No. This only activates litellm's existing wrapper retry mechanism with user-configured values. SDK-based providers (OpenAI, Azure) continue to use their native retry via max_retries (litellm sets it to 0 during wrapper retries). httpx-based providers now benefit from proper retry handling instead of failing immediately on transient errors.

Database Migrations

None

Env Config

None

Relevant Docs

litellm retry strategy: https://docs.litellm.ai/docs/completion/retries

Related Issues or PRs

Fixes UN-3344

Dependencies Versions

None

Notes on Testing

The fix can be validated by checking prompt-service logs for retry-related entries when transient LLM errors occur. Previously, errors would show a single failure; with the fix, you should see multiple retry attempts before final failure. Exponential backoff is applied (1s, 2s, 4s, 8s, 10s cap).

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

litellm's wrapper-level retry (completion_with_retries) works for all providers including httpx-based ones (Anthropic, Vertex, Bedrock, Mistral, Azure AI Foundry), but only activates when num_retries is set in kwargs. Our adapters pass max_retries (from user UI config) which only works for SDK-based providers (OpenAI, Azure). httpx-based providers silently ignored it, resulting in zero retries on transient errors (500, 502, 503). Bridge the gap by copying the user's max_retries value into num_retries and setting retry_strategy to exponential_backoff_retry before calling litellm.completion(). litellm internally zeroes max_retries during wrapper retries to prevent double-retry with SDK providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-18T15:01:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b37f85a1-cd72-4e14-bc88-bdf1e5cbe7a5

📥 Commits

Reviewing files that changed from the base of the PR and between 9467015 and a02f0d9.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/llm.py

🚧 Files skipped from review as they are similar to previous changes (1)

unstract/sdk1/src/unstract/sdk1/llm.py

Summary by CodeRabbit

New Features
- Improved retry behavior for LLM completions: user-configured retry limits are now propagated and consistently applied across all completion methods and provider types.
- Added exponential backoff to retry handling to reduce transient failures and improve reliability.
- Honor explicit opt-out when retries are set to zero, disabling wrapper-level retries.

Walkthrough

Added a static method _set_litellm_retry_params() to LLM that maps max_retries into litellm retry fields (num_retries, max_retries=0, retry_strategy="exponential_backoff_retry"). The helper is invoked in complete, stream_complete, and acomplete before litellm calls.

Changes

Cohort / File(s)	Summary
Retry Configuration `unstract/sdk1/src/unstract/sdk1/llm.py`	Added `@staticmethod _set_litellm_retry_params(completion_kwargs: dict[str, object])` and updated `complete()`, `stream_complete()`, and `acomplete()` to call it after adapter validation and cost_model removal so litellm receives translated retry params (supports opt-out via `max_retries=0`).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: activating litellm retry for all LLM providers, which is the core objective of this PR.
Description check	✅ Passed	The description is comprehensive and follows the template structure with all required sections filled: What, Why, How, breaking changes assessment, database/env/docs/dependencies sections, testing notes, and related issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/litellm-retry-num-retries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-03-18T15:03:47Z

Greptile Summary

This PR activates litellm's wrapper-level retry mechanism (num_retries + exponential_backoff_retry) for all LLM providers by introducing _set_litellm_retry_params, which copies user-configured max_retries to num_retries and zeros out max_retries to prevent double-retry with SDK-based providers (OpenAI, Azure). The method is called consistently across complete(), stream_complete(), and acomplete(), and the updated guard (isinstance(max_retries, int) and max_retries > 0) explicitly honors both the opt-out (max_retries=0) and type safety.

Confidence Score: 5/5

Safe to merge — no P0/P1 issues found; logic is correct and well-guarded.

The implementation is clean: isinstance(max_retries, int) and max_retries > 0 correctly handles opt-out (0), None, and non-int types. Setting max_retries=0 before the litellm call prevents double-retry for SDK providers. tenacity is available transitively via llama-index-core. The pinned litellm==1.81.7 (released ~Feb 2026) post-dates the July 2025 acompletion retry bug fix (PR #12848). All remaining considerations are P2 at most.

No files require special attention.

Important Files Changed

Filename	Overview
unstract/sdk1/src/unstract/sdk1/llm.py	New `_set_litellm_retry_params` static method bridges `max_retries` → `num_retries` + `retry_strategy` for all providers; called consistently in complete(), stream_complete(), and acomplete(). Logic is clean and correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LLM
    participant _set_litellm_retry_params
    participant litellm

    Caller->>LLM: complete(prompt, **kwargs)
    LLM->>LLM: adapter.validate({**self.kwargs, **kwargs})
    LLM->>_set_litellm_retry_params: completion_kwargs (with max_retries=N)
    _set_litellm_retry_params->>_set_litellm_retry_params: isinstance(N, int) and N > 0?
    _set_litellm_retry_params-->>LLM: num_retries=N, max_retries=0, retry_strategy=exponential_backoff_retry
    LLM->>litellm: completion(**completion_kwargs)
    Note over litellm: Wrapper-level retry loop (num_retries=N)<br/>Exponential backoff: 1s->2s->4s->8s->10s cap
    loop On transient error (up to N times)
        litellm->>litellm: retry with backoff
    end
    litellm-->>LLM: response
    LLM-->>Caller: {response: LLMResponseCompat, ...}

_{Reviews (3): Last reviewed commit: "UN-3344 [FIX] Honor explicit max_retries..." | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 473-476: The current truthy check for max_retries skips explicit
zero and permits negatives; change the branch to test "max_retries is not None",
validate that max_retries is an integer and >= 0 (or raise a ValueError for
invalid values), then set completion_kwargs["num_retries"] = max_retries and
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this
logic around the max_retries handling (completion_kwargs, max_retries,
num_retries, retry_strategy) so zero is honored and negatives/non-integers are
rejected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 07e1c53a-ce05-4ead-9621-c256d668fe67

📥 Commits

Reviewing files that changed from the base of the PR and between 41137d8 and 2ae9c3f.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/llm.py

coderabbitai · 2026-03-18T15:05:59Z

+        max_retries = completion_kwargs.get("max_retries")
+        if max_retries:
+            completion_kwargs["num_retries"] = max_retries
+            completion_kwargs["retry_strategy"] = "exponential_backoff_retry"


⚠️ Potential issue | 🟡 Minor

Handle zero and invalid retry counts explicitly.

At Line 474, using a truthy check skips an explicit max_retries=0 and allows negative values through. Please branch on is not None and validate bounds before copying.

Suggested patch

max_retries = completion_kwargs.get("max_retries") - if max_retries: - completion_kwargs["num_retries"] = max_retries - completion_kwargs["retry_strategy"] = "exponential_backoff_retry" + if max_retries is None: + return + if not isinstance(max_retries, int) or max_retries < 0: + raise SdkError("Invalid max_retries: expected a non-negative integer") + completion_kwargs["num_retries"] = max_retries + if max_retries > 0: + completion_kwargs["retry_strategy"] = "exponential_backoff_retry"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 473 - 476, The current truthy check for max_retries skips explicit zero and permits negatives; change the branch to test "max_retries is not None", validate that max_retries is an integer and >= 0 (or raise a ValueError for invalid values), then set completion_kwargs["num_retries"] = max_retries and completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this logic around the max_retries handling (completion_kwargs, max_retries, num_retries, retry_strategy) so zero is honored and negatives/non-integers are rejected.

SDK-based providers (OpenAI, Azure) default to max_retries=2 internally even when not explicitly set. Without zeroing it, the first attempt exhausts SDK retries before the wrapper retry kicks in, multiplying total attempts (e.g. 5 SDK + 5 wrapper = 11 instead of expected 5). Setting max_retries=0 ensures all retries go through litellm's wrapper uniformly across all providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

unstract/sdk1/src/unstract/sdk1/llm.py (1)

474-478: ⚠️ Potential issue | 🟡 Minor

Handle retry bounds explicitly instead of truthy-checking.

On Line 475, truthy branching is brittle for retry config. Please branch on is None and validate non-negative integer input before setting retry fields.

Suggested patch

     max_retries = completion_kwargs.get("max_retries")
-    if max_retries:
-        completion_kwargs["num_retries"] = max_retries
-        completion_kwargs["max_retries"] = 0
-        completion_kwargs["retry_strategy"] = "exponential_backoff_retry"
+    if max_retries is None:
+        return
+    if not isinstance(max_retries, int) or max_retries < 0:
+        raise SdkError("Invalid max_retries: expected a non-negative integer")
+    completion_kwargs["num_retries"] = max_retries
+    completion_kwargs["max_retries"] = 0
+    if max_retries > 0:
+        completion_kwargs["retry_strategy"] = "exponential_backoff_retry"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 474 - 478, The current
truthy check for max_retries is brittle; in
unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs
and max_retries and change the branching to check "if max_retries is not None"
then validate that max_retries is an integer and >= 0 (raise a ValueError if
not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries,
completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] =
"exponential_backoff_retry"; ensure invalid types or negative values are
rejected with a clear error message referencing max_retries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 474-478: The current truthy check for max_retries is brittle; in
unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs
and max_retries and change the branching to check "if max_retries is not None"
then validate that max_retries is an integer and >= 0 (raise a ValueError if
not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries,
completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] =
"exponential_backoff_retry"; ensure invalid types or negative values are
rejected with a clear error message referencing max_retries.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d2e77e4-9b3b-4b66-b2b4-69da9eb66a62

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae9c3f and 9467015.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/llm.py

pk-zipstack

LGTM

chandrasekharan-zipstack · 2026-04-01T13:50:33Z

This can be closed if #1886 gets merged

…ping Addresses CodeRabbit/Greptile review on PR #1867: - Replace truthy `if max_retries:` with `isinstance(int) and > 0` so an explicit max_retries=0 (opt-out) is honored and non-int / negative values don't silently slip through. - Type completion_kwargs as dict[str, object] instead of bare dict. - Emit debug log on wrapper retry activation for observability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-17T05:54:25Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 196 passed, 0 failed (196 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-04-17T05:54:47Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

chandrasekharan-zipstack · 2026-04-18T04:58:35Z

Closing this since #1886 got merged

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

Comment thread unstract/sdk1/src/unstract/sdk1/llm.py

chandrasekharan-zipstack self-assigned this Mar 18, 2026

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

chandrasekharan-zipstack requested review from gaya3-zipstack, hari-kuriakose and pk-zipstack March 18, 2026 15:08

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

pk-zipstack approved these changes Mar 19, 2026

View reviewed changes

chandrasekharan-zipstack mentioned this pull request Mar 31, 2026

UN-3344 [FIX] Unified retry for LLM and embedding providers #1886

Merged

chandrasekharan-zipstack closed this Apr 18, 2026

Conversation

chandrasekharan-zipstack commented Mar 18, 2026

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pk-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

chandrasekharan-zipstack commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Test Results

Uh oh!

sonarqubecloud bot commented Apr 17, 2026

Quality Gate passed

Uh oh!

chandrasekharan-zipstack commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 18, 2026 •

edited

Loading