Overview
External service calls (email provider, payment gateway, CDN invalidation) fail immediately on transient errors (network blip, 503). There is no retry logic, so a 1-second network hiccup causes a user-visible payment or email failure that requires manual intervention.
Specifications
Features:
- Retry transient failures (5xx, network errors) with exponential backoff and jitter.
- Stop retrying for client errors (4xx) as they are not transient.
Tasks:
- Create a
RetryPolicy utility using cockatiel or a custom implementation with max 3 retries, 1s base delay, 2x multiplier, 30s max delay, and full jitter.
- Apply
RetryPolicy to EmailService, PaymentProviderService, and CdnService.
- Add a Prometheus counter
external_call_retry_total{service, attempt}.
- Add unit tests for retry behavior with mocked transient failures.
Impacted Files:
- New
src/common/utils/retry-policy.ts
src/notifications/email/, src/payments/providers/, src/cdn/
Acceptance Criteria
- A single transient 503 from the email provider is retried transparently.
- After 3 consecutive failures, the error is propagated to the caller.
- Prometheus counter shows retry counts per service.
Overview
External service calls (email provider, payment gateway, CDN invalidation) fail immediately on transient errors (network blip, 503). There is no retry logic, so a 1-second network hiccup causes a user-visible payment or email failure that requires manual intervention.
Specifications
Features:
Tasks:
RetryPolicyutility usingcockatielor a custom implementation with max 3 retries, 1s base delay, 2x multiplier, 30s max delay, and full jitter.RetryPolicytoEmailService,PaymentProviderService, andCdnService.external_call_retry_total{service, attempt}.Impacted Files:
src/common/utils/retry-policy.tssrc/notifications/email/,src/payments/providers/,src/cdn/Acceptance Criteria