Skip to content

Dual-path retry: exponential backoff + rate-limit handling#280

Open
MichaelGHSeg wants to merge 3 commits into
masterfrom
status-response-update
Open

Dual-path retry: exponential backoff + rate-limit handling#280
MichaelGHSeg wants to merge 3 commits into
masterfrom
status-response-update

Conversation

@MichaelGHSeg
Copy link
Copy Markdown
Contributor

Summary

Replaces the original retry loop with a structured dual-path retry system.

  • 429 + Retry-After header: sleep for the specified duration (capped at rate_limit_retry_after_cap), does NOT consume the retry budget. Bounded by max_rate_limit_duration (default 12h).
  • Other retryable errors (5xx, 408, 410, 460): counted exponential backoff. Bounded by retries count and max_total_backoff_duration (default 12h).
  • Non-retryable errors: discard immediately.
  • Adds X-Retry-Count header on retry attempts.
  • Narrows success? / success_status? to 2xx only — Net::HTTP doesn't follow redirects, so treating 3xx as success would silently drop batches.
  • Fixes FakeBackoffPolicy missing reset! method (caused all transport specs using custom backoff policies to crash).
  • Updates malformed-JSON-on-200 test to reflect correct behavior: a 200 means the server accepted the batch regardless of body parseability.
  • E2E: enables retry test suite.

Test plan

  • bundle exec rake spec passes
  • E2E basic,retry suites pass (48/48)

- Transport: dual-path retry loop (429+Retry-After vs counted exponential backoff), X-Retry-Count header on retries, retryable status classification (5xx except 501/505/511; 4xx only 408/410/429/460)
- BackoffPolicy: update constants (base 500ms, cap 60s, multiplier 2); add reset! method
- Response: add success? method (2xx+3xx)
- Defaults: add MAX_TOTAL_BACKOFF_DURATION, MAX_RATE_LIMIT_DURATION, RATE_LIMIT_RETRY_AFTER_CAP constants
- Worker: use response.success? instead of status == 200
- Tests: cover new retry paths, X-Retry-Count, parse_retry_after, retryable/non-retryable status codes
…pdate

- Add reset! to FakeBackoffPolicy so transport specs don't crash
- Narrow success? and success_status? to 2xx only (Net::HTTP doesn't
  follow redirects, so 3xx would silently lose batches)
- Update malformed-JSON-on-200 test to match new semantics: a 200 means
  the server accepted the batch regardless of body parseability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant