fix(provider): surface 429/503 provider failures behind ACP errors (reimpl of #653)#706
fix(provider): surface 429/503 provider failures behind ACP errors (reimpl of #653)#706xdotli wants to merge 1 commit into
Conversation
Extend the existing 401/403 provider-auth detection (#546/#564) to also classify provider rate-limit (429) and unavailable (503) failures that ACP agents wrap as a generic '-32603 Internal error'. The real status lives only in the proxy-captured trajectory; surface a sanitized marker (status code only, never response bodies/headers) so retry classification can act on it. - scoring.py: add PROVIDER_RATE_LIMIT; 429-markers -> provider_rate_limit (non-retryable, daily caps won't recover in-window), 503-markers -> infra_failure (transient, stays retryable). - evaluation.py: RetryConfig.exclude_categories defaults add PROVIDER_RATE_LIMIT. - litellm_logging.py: generalize _provider_auth_status extraction to 401/403/429/503 with rate-limit/unavailable hint regexes; keep a backward-compatible auth-only shim. - rollout: add ProviderFailure + _provider_failure_from_runtime/_from_status (split across rollout/_usage.py and rollout/__init__.py for the v0.6 package layout); _classify_acp_error appends the sanitized marker. Operates on a disjoint path from #682's silent-zero-token api-error classifier (which only runs when self._error is None), so no overlap. Reimplements #653 (a fork PR that targeted the pre-split monolithic rollout.py) on release/v0.6.0. Original author: bingran-you. Co-authored-by: bingran-you <bingran@benchflow.ai>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77d6688935
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| failure = _provider_failure_from_runtime(runtime) | ||
| if failure and failure.marker == "provider auth failed": | ||
| return failure.status |
There was a problem hiding this comment.
Preserve auth scan past later provider failures
When a captured trajectory contains a 401/403 followed by a 429 or 503, this backward-compatible auth helper now returns None because _provider_failure_from_runtime() stops at the latest provider failure even if it is non-auth. The previous behavior scanned newest-first but ignored non-auth statuses, so callers/tests using _provider_auth_status_from_runtime() and cleanup's _provider_auth_status_cached can miss a real auth failure in mixed-failure trajectories; keep this helper scanning until it finds 401/403 rather than delegating to the generalized latest-failure helper.
Useful? React with 👍 / 👎.
|
Closing — this and #653 are independent re-implementations of the same fix on |
|
Superseded — closing. This PR was a parallel reimplementation of @bingran-you's fork #653 on the v0.6 |
Reimplements #653 (by @bingran-you) on
release/v0.6.0. The original is a fork PR that patched the pre-split monolithicrollout.py; this lands the same fix mapped onto the v0.6rollout/package layout, with all of #653's tests.What
Extends the existing 401/403 provider-auth detection (#546/#564) to also classify provider rate-limit (429) and unavailable (503) failures that ACP agents wrap as a generic
-32603 Internal error. The real status lives only in the proxy-captured trajectory, so we surface a sanitized marker (status code only — never response bodies or headers) that retry classification can act on.provider auth failedprovider_authprovider rate limitedprovider_rate_limit(new)provider unavailableinfra_failureChanges
_utils/scoring.py— addPROVIDER_RATE_LIMIT; 429-markers →provider_rate_limit, 503-markers →infra_failure.evaluation.py—RetryConfig.exclude_categoriesdefault addsPROVIDER_RATE_LIMIT.providers/litellm_logging.py— generalize_provider_auth_statusextraction to 401/403/429/503 with rate-limit/unavailable hint regexes; keep a backward-compatible auth-only shim (_provider_auth_status_from_failure_record).rollout/_usage.py+rollout/__init__.py— addProviderFailure+_provider_failure_from_runtime/_from_status;_provider_auth_status_from_runtimebecomes an auth-only view; snapshot_provider_failure_cachedincleanup();_classify_acp_errorappendsprovider_failure.error_suffix.Why it doesn't overlap with #682
#682's
_maybe_classify_api_errorruns only whenself._error is None(silent zero-token failures). When a 429 surfaces as an ACP error,self._erroris already set, so #682's path returns early — the failure staysACP_ERROR(retryable) today. This PR operates on that disjoint ACP-error path. One known seam: an ACP-wrapped 429 is treated as non-retryable here, while #682's silent-failure 429 israte_limit/transient; that's intentional (an agent that gave up with a daily-cap 429 should not burn retries), but worth a maintainer eye.Verification
pytest tests/test_scoring.py tests/test_job.py tests/test_eval_worker_retry.py tests/test_provider_auth_detection.py→ 110 passed (includes fix(provider): surface provider failures behind ACP errors #653's new daily-cap/503 cases).-k 'provider or litellm or retry or scoring or api_error or diagnostic'→ 569 passed, 2 skipped.Closes #653 (supersedes the fork PR). Credit: @bingran-you.
Note
Medium Risk
Changes rollout error classification, retry defaults, and eval worker behavior for provider failures; low security risk (still status-only) but affects batch retry spend and error bucketing.
Overview
Extends provider failure detection beyond 401/403 so 429 (rate limit) and 503 (unavailable) hidden behind generic ACP
-32603 Internal errorget sanitized markers from the proxy-captured trajectory (status code only, no response bodies).Rollout path:
ProviderFailureand_provider_failure_from_runtimereplace auth-only snapshots;cleanup()caches_provider_failure_cachedand_classify_acp_errorappendsprovider_failure.error_suffix(e.g.provider rate limited (HTTP 429)).Classification & retries: New
provider_rate_limitcategory; 503 markers map toinfra_failure(still retryable).RetryConfigdefault excludesprovider_rate_limitalongsideprovider_authso daily caps do not burn retries.LiteLLM import: Failure extraction generalized to 401/403/429/503 with hint regexes; trajectories get correct status codes for Bedrock-style daily-cap errors.
Reviewed by Cursor Bugbot for commit 77d6688. Bugbot is set up for automated code reviews on this repo. Configure here.