feat: add retry backoff for transient failures#30
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughHTTP requests now retry transient transport failures and 502/503/504 responses with jittered exponential backoff. Status handling now raises specific SDK errors for 400, 401, 403, and 404. Sync retry behavior is covered by new tests. ChangesHTTP retry and status mapping
Sequence Diagram(s)sequenceDiagram
participant Caller
participant "SyncHTTPClient.request" as SyncHTTPClient_request
participant "_execute" as execute
participant "_raise_for_status" as raiseStatus
participant "time.sleep" as sleep
Caller->>SyncHTTPClient_request: request(...)
SyncHTTPClient_request->>execute: send HTTP request
execute-->>SyncHTTPClient_request: response or transport exception
alt transport exception is retryable
SyncHTTPClient_request->>sleep: backoff delay
SyncHTTPClient_request->>execute: retry request
else response status is 502/503/504 and attempts remain
SyncHTTPClient_request->>raiseStatus: inspect status
raiseStatus-->>SyncHTTPClient_request: wait interval
SyncHTTPClient_request->>sleep: wait interval
else response status is 400/401/403/404
SyncHTTPClient_request->>raiseStatus: map status
raiseStatus-->>Caller: specific SDK error
else response is 2xx
SyncHTTPClient_request-->>Caller: parsed JSON
else retries exhausted
SyncHTTPClient_request->>raiseStatus: exhausted retry
raiseStatus-->>Caller: NetworkError
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_rate_limit.py (1)
270-330: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winAdd coverage for transport-exception retries.
These tests only exercise status-code retries. Please add a sync test where
_executeraises a retryable transport exception, then succeeds, so the newexceptretry path is protected.Example test
+ def test_retries_on_transport_error_then_succeeds(self): + client = self._client(max_retries=2) + calls = 0 + + def fake_execute(req): + nonlocal calls + calls += 1 + if calls <= 2: + raise TimeoutError("temporary timeout") + return 200, {}, _fake_200_body() + + sleep_calls: List[float] = [] + with patch.object(client, "_execute", side_effect=fake_execute), \ + patch("time.sleep", side_effect=lambda s: sleep_calls.append(s)), \ + patch("shade.http.random.uniform", side_effect=[0.0, 0.0]): + result = client.request("GET", "/payments") + + assert result == {"id": "pay_123", "status": "ok"} + assert calls == 3 + assert sleep_calls == [1.0, 2.0]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_rate_limit.py` around lines 270 - 330, Add a new synchronous test in test_rate_limit.py alongside test_retries_on_transient_5xx_then_succeeds that covers the retry path when _execute raises a retryable transport exception instead of returning a 5xx. Patch client._execute to raise the transport exception once or twice and then return a 200 response, and assert client.request("GET", "/payments") succeeds and time.sleep is called with the expected backoff values. Use the existing client.request, _execute, and sleep patching pattern so the new except-based retry handling stays covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/shade/http.py`:
- Around line 182-183: The 404 handling in http.py is discarding the response
body before raising NotFoundError, so resource_type and resource_id cannot be
parsed. Update the status == 404 branch in the response handling logic to pass
through the original response body when constructing NotFoundError instead of
always using a generic message. Use the existing error-raising path in the HTTP
response handler so NotFoundError can extract the resource details.
- Around line 86-92: The retry classifier in _is_retryable_transport_error is
missing common aiohttp transport failures, so async connection hiccups can skip
retries. Update _is_retryable_transport_error to treat
aiohttp.ClientConnectionError and its common subclasses such as
ClientConnectorError, ClientOSError, and ServerDisconnectedError as retryable,
while keeping ServerTimeoutError covered via TimeoutError. Make sure the new
checks fit alongside the existing httpx, ConnectionResetError, TimeoutError, and
urllib.error.URLError handling.
---
Nitpick comments:
In `@tests/test_rate_limit.py`:
- Around line 270-330: Add a new synchronous test in test_rate_limit.py
alongside test_retries_on_transient_5xx_then_succeeds that covers the retry path
when _execute raises a retryable transport exception instead of returning a 5xx.
Patch client._execute to raise the transport exception once or twice and then
return a 200 response, and assert client.request("GET", "/payments") succeeds
and time.sleep is called with the expected backoff values. Use the existing
client.request, _execute, and sleep patching pattern so the new except-based
retry handling stays covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5dabe333-6e35-47dc-98b6-5d6e73500046
📒 Files selected for processing (2)
src/shade/http.pytests/test_rate_limit.py
codebestia
left a comment
There was a problem hiding this comment.
LGTM!
Thank you for your contribution.
Description
This PR adds automatic retry handling for transient network failures and server-side 5xx responses. The SDK now retries connection resets, timeouts, and 502/503/504 responses with exponential backoff plus jitter, while immediately surfacing non-retryable client errors such as 400s. Retry attempts are logged at DEBUG level, and exhausted retries raise NetworkError.
Fixes #10
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist:
Summary by CodeRabbit