Skip to content

fix(proxy): classify errors by rule category to preserve failover for service errors#1103

Open
KevinShiCN wants to merge 1 commit into
ding113:devfrom
KevinShiCN:fix/error-rule-category-aware-classification
Open

fix(proxy): classify errors by rule category to preserve failover for service errors#1103
KevinShiCN wants to merge 1 commit into
ding113:devfrom
KevinShiCN:fix/error-rule-category-aware-classification

Conversation

@KevinShiCN
Copy link
Copy Markdown
Contributor

@KevinShiCN KevinShiCN commented Apr 25, 2026

Summary

isNonRetryableClientErrorAsync (and its sync companion) previously treated any matched error rule as NON_RETRYABLE_CLIENT_ERROR, ignoring the rule's category field. This caused user-added rules with categories like service_error or network_error to short-circuit failover/circuit-breaker logic for upstream/transport problems that should actually be retried or routed to a different provider.

This PR adds an explicit category white-list of client-input categories. Only matches whose category falls within this set are classified as NON_RETRYABLE_CLIENT_ERROR; others fall back to the standard ProxyError-based classification (typically PROVIDER_ERROR or SYSTEM_ERROR), preserving failover and circuit-breaker behaviour.

Related Issues:

Reproduction (CCH 0.7.2 production, observed 2026-04-25)

A user added a regex rule with the pattern 529|service.overloaded and category=service_error (and another 503|service.unavailable for service_error). When an upstream provider returned 403 INSUFFICIENT_BALANCE:

{
  "error": {
    "type": "new_api_error",
    "message": "预扣费额度失败, 用户剩余额度: ¥0.214220, 需要预扣费额度: ¥0.352942 (request id: 20260425055420369306260Kfzayl7j)"
  }
}

The substring 529 matched inside the price 0.352942 (positions 3–5 of 352942). The old logic:

  1. matched=trueisNonRetryableClientErrorAsync returned true
  2. categorizeErrorAsync returned NON_RETRYABLE_CLIENT_ERROR
  3. forwarder.ts logged "White-listed client error … stopping immediately" and aborted failover
  4. The client received a generic permission_error: 上游拒绝了本次请求 (cch_session_id: …) with no upstream detail. The user could not tell that the real cause was an exhausted upstream balance.

Symmetric symptom for 503|service.unavailable matching 503 inside a request ID like 2026042505**503**99959….

After this PR:

  • The match still happens (the regex bug in user-added rules is out of scope), but category=service_error is not in the white-list, so the error falls back to PROVIDER_ERROR. Failover and circuit-breaker logic resume normally.
  • Real client-input errors (prompt_limit, validation_error, content_filter, etc.) continue to be white-listed and short-circuit retries as before.

White-list contents

Aligned with the categories actually used by DEFAULT_ERROR_RULES:

prompt_limit, input_limit, context_limit, token_limit,
content_filter, validation_error, thinking_error, parameter_error,
pdf_limit, media_limit, cache_limit, invalid_request,
model_error, store_error

User-added rules with categories outside this list (e.g. service_error, network_error, rate_limit) will no longer be wrongly white-listed. Their underlying errors will be classified by the existing ProxyError-based rules, which is the correct behaviour for upstream/transport failures.

Changes

  • src/app/v1/_lib/proxy/errors.ts
    • New NON_RETRYABLE_CLIENT_ERROR_CATEGORIES set
    • New shouldTreatMatchedRuleAsNonRetryable(result) helper
    • isNonRetryableClientError (sync) and isNonRetryableClientErrorAsync (async) now use the helper instead of returning result.matched directly
    • New getNonRetryableClientErrorCategoriesForTesting() accessor exposing a read-only view of the set for unit tests
  • tests/unit/proxy/error-category-aware-classification.test.ts (new) — 10 unit tests covering:
    • White-listed category (validation_error) → NON_RETRYABLE
    • Non-white-listed categories (service_error, network_error, rate_limit) → not NON_RETRYABLE
    • No rule match → not NON_RETRYABLE
    • Each white-listed category triggers NON_RETRYABLE (data-driven)
    • categorizeErrorAsync regression: substring match on service_error rule still produces PROVIDER_ERROR
    • categorizeErrorAsync regression: network_error rule on plain Error still produces SYSTEM_ERROR
    • categorizeErrorAsync: validation_error rule still produces NON_RETRYABLE_CLIENT_ERROR
    • Sync isNonRetryableClientError respects white-list when cache-hit

Test results

✓ tests/unit/proxy/error-category-aware-classification.test.ts (10/10 passed)
✓ tests/unit/proxy/client-abort-vs-upstream-499.test.ts (regression, 16/16 passed)
✓ tests/unit/lib/error-rule-detector-reload-queue.test.ts (regression, passed)
✓ tests/unit/repository/error-rules-default-codex-responses.test.ts (regression, passed)
✓ tests/unit/repository/error-rules-default-thinking-tooluse.test.ts (regression, passed)

bun run typecheck passes cleanly. bunx biome check is clean on the two changed files.

Backward compatibility / migration

  • No schema or data migration required.
  • For users who currently rely on the previous behaviour (i.e. who added rules with service_error/network_error/etc. categories intending to short-circuit retries), the behaviour change is the entire point of this PR — those errors should retry/failover, which is what the existing ProxyError-based classification already does correctly.
  • If a deployment still wants to white-list a custom category, that should be a follow-up feature (e.g. a non_retryable: boolean column on error_rules), not a hidden side-effect of category.

Out of scope (tracked separately)

  • The user-added regex rules with substring-prone patterns (503|service.unavailable, 529|service.overloaded) are user data, not part of CCH's defaults — the upstream is_default=true rules are unaffected. Users should still tighten those patterns with \b boundaries; that is operator hygiene, not an upstream bug.
  • The 403 fallback message 上游拒绝了本次请求 in error-handler.ts and the aggressive sanitisation in client-error-message.ts strip operationally useful upstream details (e.g. balance amounts). This deserves a separate discussion / issue about how to safely expose business-error context to clients.

PR Checklist

  • Target branch is dev
  • Conventional Commit format
  • No emoji in code
  • No hardcoded user-facing strings (no UI strings touched)
  • bun run typecheck passes
  • bunx biome check clean on changed files
  • New tests added; existing related tests still pass
  • No DB schema / migration changes

Description enhanced by Claude AI

… service errors

Previously, isNonRetryableClientErrorAsync (and the sync companion) treated any
matched error rule as NON_RETRYABLE_CLIENT_ERROR, regardless of the rule's
category. This caused user-added rules with category like service_error or
network_error to short-circuit failover/circuit-breaker logic, even though
their underlying errors should be retried or routed to a different provider.

Worse, regex patterns without word boundaries (e.g. "503|service.unavailable",
"529|service.overloaded") matched 3-digit substrings inside upstream
response bodies (request IDs, price amounts), causing INSUFFICIENT_BALANCE
responses to be misclassified as service-overloaded and silenced.

This change introduces an explicit category white-list for client-input
errors that should genuinely bypass retry/failover. Only matches whose
category falls within this set are classified as NON_RETRYABLE_CLIENT_ERROR;
others fall back to the standard ProxyError-based classification (typically
PROVIDER_ERROR or SYSTEM_ERROR), preserving failover and circuit-breaker
behaviour.

The white-list reflects the categories actually used by DEFAULT_ERROR_RULES:
prompt_limit, input_limit, context_limit, token_limit, content_filter,
validation_error, thinking_error, parameter_error, pdf_limit, media_limit,
cache_limit, invalid_request, model_error, store_error.

Reproduction context (CCH 0.7.2 production, observed 2026-04-25):
- User added rule pattern "529|service.overloaded" with category=service_error
- Upstream returned 403 INSUFFICIENT_BALANCE; response body contained
  substring "529" inside the price "0.352942"
- Old logic: matched -> NON_RETRYABLE -> stop_immediately -> client received
  generic permission_error with no upstream detail
- New logic: matched but category not white-listed -> falls back to
  PROVIDER_ERROR -> failover to next provider -> standard behaviour

Tests: 10 new unit tests covering the white-list and regression cases for
service_error / network_error / rate_limit categories.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 25, 2026

📝 Walkthrough

Walkthrough

该变更修改了错误分类逻辑,使其仅当检测到的匹配规则的category在新的NON_RETRYABLE_CLIENT_ERROR_CATEGORIES白名单中时,才将其归类为NON_RETRYABLE_CLIENT_ERROR。同时添加了测试验证此分类行为。

Changes

Cohort / File(s) Summary
错误分类逻辑更新
src/app/v1/_lib/proxy/errors.ts
修改isNonRetryableClientErrorisNonRetryableClientErrorAsync函数,引入NON_RETRYABLE_CLIENT_ERROR_CATEGORIES白名单来控制category的识别。新增导出函数getNonRetryableClientErrorCategoriesForTesting()供测试使用。
Category感知分类单元测试
tests/unit/proxy/error-category-aware-classification.test.ts
新增单元测试文件,覆盖异步和同步分类函数的category感知行为,验证白名单规则、分类映射(如network_error -> SYSTEM_ERROR)以及非白名单category的处理逻辑。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • ding113
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确概括了主要改动:通过规则分类来分类错误以保留服务错误的故障转移。这正是PR的核心目标。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description详细说明了问题根源、解决方案、白名单内容、测试结果和向后兼容性,与代码变更完全相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot requested a review from ding113 April 25, 2026 09:04
@github-actions github-actions Bot added bug Something isn't working area:Error Rule area:core labels Apr 25, 2026
Comment on lines +715 to 722
/**
* 暴露内部白名单常量供测试使用
*
* 注意:这是只读副本,外部不应修改运行时白名单。
*/
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
return NON_RETRYABLE_CLIENT_ERROR_CATEGORIES;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading "read-only copy" comment; actual object reference is returned

The JSDoc says "这是只读副本" (this is a read-only copy), but the function returns the live NON_RETRYABLE_CLIENT_ERROR_CATEGORIES object, not a copy. ReadonlySet<string> is a TypeScript compile-time constraint only — a test that casts the return value to Set<string> can call .add() or .delete() and mutate the runtime whitelist, causing subtle cross-test pollution.

Either return a true copy, or update the comment to clarify that ReadonlySet is the only guard in place:

// Option A: return a real copy so the comment is accurate
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
  return new Set(NON_RETRYABLE_CLIENT_ERROR_CATEGORIES);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/errors.ts
Line: 715-722

Comment:
**Misleading "read-only copy" comment; actual object reference is returned**

The JSDoc says "这是只读副本" (this is a read-only copy), but the function returns the live `NON_RETRYABLE_CLIENT_ERROR_CATEGORIES` object, not a copy. `ReadonlySet<string>` is a TypeScript compile-time constraint only — a test that casts the return value to `Set<string>` can call `.add()` or `.delete()` and mutate the runtime whitelist, causing subtle cross-test pollution.

Either return a true copy, or update the comment to clarify that `ReadonlySet` is the only guard in place:

```ts
// Option A: return a real copy so the comment is accurate
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
  return new Set(NON_RETRYABLE_CLIENT_ERROR_CATEGORIES);
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +594 to +609
const NON_RETRYABLE_CLIENT_ERROR_CATEGORIES = new Set<string>([
"prompt_limit",
"input_limit",
"context_limit",
"token_limit",
"content_filter",
"validation_error",
"thinking_error",
"parameter_error",
"pdf_limit",
"media_limit",
"cache_limit",
"invalid_request",
"model_error",
"store_error",
]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Whitelist is a silent maintenance hazard when new default categories are added

The set is hard-coded and has no compile-time link to DEFAULT_ERROR_RULES. If a future default rule is added with a new category (e.g. "citation_limit") and the whitelist is not updated, that rule's errors will silently fall back to PROVIDER_ERROR/SYSTEM_ERROR instead of short-circuiting retries — the opposite of the current bug, but still incorrect.

A lightweight guard could be a unit test that asserts every category used by DEFAULT_ERROR_RULES is present in NON_RETRYABLE_CLIENT_ERROR_CATEGORIES, catching drift at CI time rather than in production.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/errors.ts
Line: 594-609

Comment:
**Whitelist is a silent maintenance hazard when new default categories are added**

The set is hard-coded and has no compile-time link to `DEFAULT_ERROR_RULES`. If a future default rule is added with a new category (e.g. `"citation_limit"`) and the whitelist is not updated, that rule's errors will silently fall back to `PROVIDER_ERROR`/`SYSTEM_ERROR` instead of short-circuiting retries — the opposite of the current bug, but still incorrect.

A lightweight guard could be a unit test that asserts every category used by `DEFAULT_ERROR_RULES` is present in `NON_RETRYABLE_CLIENT_ERROR_CATEGORIES`, catching drift at CI time rather than in production.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/app/v1/_lib/proxy/errors.ts (1)

594-609: 白名单与 DEFAULT_ERROR_RULES 存在静默漂移风险

当前白名单是硬编码集合,PR 描述声明其与 DEFAULT_ERROR_RULES 对齐,但二者并未在代码层面建立强约束。若后续向 DEFAULT_ERROR_RULES 新增/调整 category(例如新增一个真正的客户端输入错误 category),此处容易被遗漏,导致默认规则匹配后被错误地降级为 PROVIDER_ERROR,复现本 PR 试图修复的反向问题。

建议二选一:

  1. DEFAULT_ERROR_RULES 中派生白名单(取所有 isDefault=true 且语义为客户端输入错误的 category),并辅以注释说明排除项;
  2. 增加一个轻量单测,断言 DEFAULT_ERROR_RULES 中所有“客户端输入类” category 都包含于本白名单(或反向断言),以保证未来变更时 CI 主动告警。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/v1/_lib/proxy/errors.ts` around lines 594 - 609, The hardcoded
NON_RETRYABLE_CLIENT_ERROR_CATEGORIES set can drift from DEFAULT_ERROR_RULES;
update the implementation to derive the whitelist programmatically from
DEFAULT_ERROR_RULES (e.g., filter entries where isDefault===true and
semantically denote client/input errors) and document any explicit exclusions,
or alternatively add a unit test that asserts all client-input categories in
DEFAULT_ERROR_RULES are included in NON_RETRYABLE_CLIENT_ERROR_CATEGORIES (or
the inverse) so CI fails on future mismatches; reference DEFAULT_ERROR_RULES and
NON_RETRYABLE_CLIENT_ERROR_CATEGORIES to locate where to implement the
derivation or add the test.
tests/unit/proxy/error-category-aware-classification.test.ts (2)

344-362: 同步路径的覆盖建议增加“白名单命中 -> true”用例

当前同步用例仅验证了 service_error(非白名单)在缓存命中时返回 false 的方向。建议追加一个对称用例:先用白名单 category(如 validation_error)的规则预热缓存,断言 isNonRetryableClientError 返回 true,避免未来若有人误改 shouldTreatMatchedRuleAsNonRetryable 在同步分支的语义而无回归保护。

建议追加用例
   test("sync version respects white-list when cache hit", async () => {
     // ...existing assertion (false direction)...
   });
+
+  test("sync version returns true when cached match has white-listed category", async () => {
+    await loadDetectorWithRules([
+      buildRule({
+        id: 41,
+        pattern: "validation-token",
+        matchType: "contains",
+        category: "validation_error",
+      }),
+    ]);
+
+    const { isNonRetryableClientError, isNonRetryableClientErrorAsync } = await import(
+      "@/app/v1/_lib/proxy/errors"
+    );
+    const error = new Error("upstream said: validation-token");
+    await isNonRetryableClientErrorAsync(error);
+    expect(isNonRetryableClientError(error)).toBe(true);
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/proxy/error-category-aware-classification.test.ts` around lines
344 - 362, Add a symmetric test that primes the cache via the async path with a
white-list category rule and asserts the sync path returns true: use
loadDetectorWithRules with buildRule({ id: X, pattern: "validation-token",
matchType: "contains", category: "validation_error" }), call
isNonRetryableClientErrorAsync(new Error("...validation-token")) to warm the
cache, then assert isNonRetryableClientError(new Error("...validation-token"))
=== true; place this alongside the existing "sync version respects white-list
when cache hit" test and reference the same functions isNonRetryableClientError
and isNonRetryableClientErrorAsync to protect the sync branch semantics.

104-112: setTimeout(0) 的意图建议在注释中说明或移除

await new Promise((resolve) => setTimeout(resolve, 0)); 紧跟在设置 mock 之后、reload() 之前,意图不明:是等待某个微任务/订阅注册,还是历史遗留?若用于等待 subscribeCacheInvalidation 的异步副作用,请补一行注释说明;若实际并无作用,可直接移除以减少测试中的玄学等待。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/proxy/error-category-aware-classification.test.ts` around lines
104 - 112, The zero-delay timer in loadDetectorWithRules (the await new
Promise(resolve => setTimeout(resolve, 0)) placed after mocking
mocks.getActiveErrorRules and before calling errorRuleDetector.reload()) is
unclear: either add a one-line comment explaining it waits for a microtask/async
subscription (e.g. to allow subscribeCacheInvalidation or other async
side-effects to register) or remove the timer if it has no effect;
alternatively, replace it with an explicit, intention-revealing helper (e.g.
flushPromises or await nextTick) and keep errorRuleDetector.reload() and
mocks.getActiveErrorRules references intact so the test deterministically waits
for the needed async registration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/v1/_lib/proxy/errors.ts`:
- Around line 715-722: The JSDoc incorrectly claims
getNonRetryableClientErrorCategoriesForTesting returns a read-only copy while it
currently returns the original Set reference
NON_RETRYABLE_CLIENT_ERROR_CATEGORIES; either return an actual copy (e.g.,
construct and return a new Set from NON_RETRYABLE_CLIENT_ERROR_CATEGORIES) to
make it immutable at runtime, or update the JSDoc to state that the function
returns the same runtime Set and only TypeScript's ReadonlySet typing prevents
mutation; modify getNonRetryableClientErrorCategoriesForTesting and/or its JSDoc
accordingly and keep the symbol NON_RETRYABLE_CLIENT_ERROR_CATEGORIES as the
source of truth.

---

Nitpick comments:
In `@src/app/v1/_lib/proxy/errors.ts`:
- Around line 594-609: The hardcoded NON_RETRYABLE_CLIENT_ERROR_CATEGORIES set
can drift from DEFAULT_ERROR_RULES; update the implementation to derive the
whitelist programmatically from DEFAULT_ERROR_RULES (e.g., filter entries where
isDefault===true and semantically denote client/input errors) and document any
explicit exclusions, or alternatively add a unit test that asserts all
client-input categories in DEFAULT_ERROR_RULES are included in
NON_RETRYABLE_CLIENT_ERROR_CATEGORIES (or the inverse) so CI fails on future
mismatches; reference DEFAULT_ERROR_RULES and
NON_RETRYABLE_CLIENT_ERROR_CATEGORIES to locate where to implement the
derivation or add the test.

In `@tests/unit/proxy/error-category-aware-classification.test.ts`:
- Around line 344-362: Add a symmetric test that primes the cache via the async
path with a white-list category rule and asserts the sync path returns true: use
loadDetectorWithRules with buildRule({ id: X, pattern: "validation-token",
matchType: "contains", category: "validation_error" }), call
isNonRetryableClientErrorAsync(new Error("...validation-token")) to warm the
cache, then assert isNonRetryableClientError(new Error("...validation-token"))
=== true; place this alongside the existing "sync version respects white-list
when cache hit" test and reference the same functions isNonRetryableClientError
and isNonRetryableClientErrorAsync to protect the sync branch semantics.
- Around line 104-112: The zero-delay timer in loadDetectorWithRules (the await
new Promise(resolve => setTimeout(resolve, 0)) placed after mocking
mocks.getActiveErrorRules and before calling errorRuleDetector.reload()) is
unclear: either add a one-line comment explaining it waits for a microtask/async
subscription (e.g. to allow subscribeCacheInvalidation or other async
side-effects to register) or remove the timer if it has no effect;
alternatively, replace it with an explicit, intention-revealing helper (e.g.
flushPromises or await nextTick) and keep errorRuleDetector.reload() and
mocks.getActiveErrorRules references intact so the test deterministically waits
for the needed async registration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6f0f976-c382-430b-a66c-3e084bad6afb

📥 Commits

Reviewing files that changed from the base of the PR and between 4ec2d17 and e41f0af.

📒 Files selected for processing (2)
  • src/app/v1/_lib/proxy/errors.ts
  • tests/unit/proxy/error-category-aware-classification.test.ts

Comment on lines +715 to 722
/**
* 暴露内部白名单常量供测试使用
*
* 注意:这是只读副本,外部不应修改运行时白名单。
*/
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
return NON_RETRYABLE_CLIENT_ERROR_CATEGORIES;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

JSDoc 与实现不一致:返回的不是“只读副本”

注释写的是“只读副本,外部不应修改运行时白名单”,但函数实际返回的是底层 Set 的同一引用,仅依赖 ReadonlySet<string> 的类型断言来阻止修改。调用方一旦使用 as Set<string> 类型断言或 JS 调用,就能直接修改运行时白名单,造成全局副作用。

建议二选一(任选其一即可消除歧义):

方案 A:真正返回副本
 export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
-  return NON_RETRYABLE_CLIENT_ERROR_CATEGORIES;
+  return new Set(NON_RETRYABLE_CLIENT_ERROR_CATEGORIES);
 }
方案 B:保留同一引用,但修正注释,明确仅靠类型层面保护
-/**
- * 暴露内部白名单常量供测试使用
- *
- * 注意:这是只读副本,外部不应修改运行时白名单。
- */
+/**
+ * 暴露内部白名单常量供测试使用
+ *
+ * 注意:返回值通过 ReadonlySet 类型仅在编译期阻止修改,
+ * 运行时仍是同一引用;测试中切勿通过类型断言绕过只读约束。
+ */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* 暴露内部白名单常量供测试使用
*
* 注意:这是只读副本,外部不应修改运行时白名单。
*/
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
return NON_RETRYABLE_CLIENT_ERROR_CATEGORIES;
}
/**
* 暴露内部白名单常量供测试使用
*
* 注意:这是只读副本,外部不应修改运行时白名单。
*/
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
return new Set(NON_RETRYABLE_CLIENT_ERROR_CATEGORIES);
}
Suggested change
/**
* 暴露内部白名单常量供测试使用
*
* 注意:这是只读副本,外部不应修改运行时白名单。
*/
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
return NON_RETRYABLE_CLIENT_ERROR_CATEGORIES;
}
/**
* 暴露内部白名单常量供测试使用
*
* 注意:返回值通过 ReadonlySet 类型仅在编译期阻止修改,
* 运行时仍是同一引用;测试中切勿通过类型断言绕过只读约束。
*/
export function getNonRetryableClientErrorCategoriesForTesting(): ReadonlySet<string> {
return NON_RETRYABLE_CLIENT_ERROR_CATEGORIES;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/v1/_lib/proxy/errors.ts` around lines 715 - 722, The JSDoc
incorrectly claims getNonRetryableClientErrorCategoriesForTesting returns a
read-only copy while it currently returns the original Set reference
NON_RETRYABLE_CLIENT_ERROR_CATEGORIES; either return an actual copy (e.g.,
construct and return a new Set from NON_RETRYABLE_CLIENT_ERROR_CATEGORIES) to
make it immutable at runtime, or update the JSDoc to state that the function
returns the same runtime Set and only TypeScript's ReadonlySet typing prevents
mutation; modify getNonRetryableClientErrorCategoriesForTesting and/or its JSDoc
accordingly and keep the symbol NON_RETRYABLE_CLIENT_ERROR_CATEGORIES as the
source of truth.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a category-aware whitelist for non-retryable client errors in the proxy error handling logic. By restricting non-retryable status to specific categories like validation_error and prompt_limit, the system ensures that service or network-related errors correctly trigger failover and circuit-breaker mechanisms rather than stopping immediately. The changes include updates to both synchronous and asynchronous error detection functions and a new suite of unit tests to verify the classification logic and prevent regressions. I have no feedback to provide.

@github-actions github-actions Bot added the size/M Medium PR (< 500 lines) label Apr 25, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces a category-aware white-list for NON_RETRYABLE_CLIENT_ERROR classification. The fix is precisely scoped: a new NON_RETRYABLE_CLIENT_ERROR_CATEGORIES set and a shouldTreatMatchedRuleAsNonRetryable helper replace the previous result.matched short-circuit in both sync and async code paths. The white-list's 14 categories match DEFAULT_ERROR_RULES exactly. Test coverage is solid (10 tests including data-driven enumeration of all white-listed categories, regression tests for the reported production scenario, and sync-path cache-hit verification). No issues with confidence >= 80 were identified.

PR Size: M

  • Lines changed: 438 (435 additions, 3 deletions)
  • Files changed: 2

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Adequate
  • Code clarity - Good

Automated review by Claude AI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:Error Rule bug Something isn't working size/M Medium PR (< 500 lines)

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant