fix(proxy): tighten numeric error status boundaries#1104
Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
📝 WalkthroughWalkthrough更新上游错误检测正则以改进数值边界判定,并新增两套单元测试以验证状态码/错误码识别与默认规则的数值边界不会产生意外命中。 Changes
Estimated code review effort🎯 3 (中等) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refines the regex patterns used for upstream error detection to prevent false positives when status codes appear as substrings in larger numbers, such as request IDs or prices. It also introduces unit tests to verify these numeric boundary conditions. Feedback suggests refining the regex patterns further by combining word boundaries with negative lookaheads to maintain strictness (e.g., preventing matches like '429abc'). Additionally, it is recommended to expand the new boundary tests to include 'contains' and 'exact' match types in addition to 'regex'.
| statusCode: 429, | ||
| matcherId: "rate_limit", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429\b|\b429\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015\b|超出频率|请求过于频繁|限流|稍后重试)/iu, | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?!\d|\.\d)|超出频率|请求过于频繁|限流|稍后重试)/iu, |
There was a problem hiding this comment.
The replacement of \b with (?!\d|\.\d) is slightly less strict than the original word boundary because it allows word characters like letters or underscores (e.g., 429abc would now match).
To achieve the goal of excluding decimal suffixes while maintaining the word boundary check, you can use \b(?!\.\d). This ensures that the match ends at a word boundary, but specifically fails if that boundary is a dot followed by a digit (a decimal).
Additionally, for the standalone 429 match, using (?<![\d.]) without \b allows matches preceded by letters (e.g., a429). Combining them as (?<![\d.])\b is safer. This logic applies to all other matchers in this array as well.
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?!\d|\.\d)|超出频率|请求过于频繁|限流|稍后重试)/iu, | |
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429\b(?!\.\d)|(?<![\d.])\b429\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015\b(?!\.\d)|超出频率|请求过于频繁|限流|稍后重试)/iu, |
| const regexRules = (await loadDefaultRules()).filter((rule) => rule.matchType === "regex"); | ||
| const samples = ["request id: 202604250550399959", "需要预扣费额度:¥0.352942"]; | ||
|
|
||
| const accidentalMatches = regexRules.flatMap((rule) => { | ||
| const pattern = new RegExp(rule.pattern, "i"); | ||
| return samples | ||
| .filter((sample) => pattern.test(sample)) | ||
| .map((sample) => ({ category: rule.category, pattern: rule.pattern, sample })); | ||
| }); |
There was a problem hiding this comment.
The current test only validates regex match types. It would be more robust to also check contains and exact rules, as a simple contains: "503" rule would also cause the false positive issue with request IDs or prices.
const allRules = await loadDefaultRules();
const samples = ["request id: 202604250550399959", "需要预扣费额度:¥0.352942"];
const accidentalMatches = allRules.flatMap((rule) => {
return samples
.filter((sample) => {
if (rule.matchType === "regex") {
return new RegExp(rule.pattern, "i").test(sample);
}
if (rule.matchType === "contains") {
return sample.toLowerCase().includes(rule.pattern.toLowerCase());
}
if (rule.matchType === "exact") {
return sample.toLowerCase() === rule.pattern.toLowerCase();
}
return false;
})
.map((sample) => ({ category: rule.category, pattern: rule.pattern, sample }));
});There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb490c6c1d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| statusCode: 429, | ||
| matcherId: "rate_limit", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429\b|\b429\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015\b|超出频率|请求过于频繁|限流|稍后重试)/iu, | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?!\d|\.\d)|超出频率|请求过于频繁|限流|稍后重试)/iu, |
There was a problem hiding this comment.
Restore strict token boundary after numeric status codes
Replacing the trailing \b with (?!\d|\.\d) makes these matchers accept alphabetic suffixes, so malformed tokens like HTTP/1.1 429abc or Error 1015x now infer an upstream status instead of being ignored. This is a regression from the previous boundary behavior and can skew fake-200 status inference used for failover/circuit-breaker accounting. Keep the decimal/long-number guard, but also require a non-word boundary after the numeric token (e.g., \b or (?![\w.])).
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
| it("does not infer overloaded status from a decimal price containing 529", () => { | ||
| const text = "需要预扣费额度:¥0.352942"; | ||
|
|
||
| expect(inferUpstreamErrorStatusCodeFromText(text)).toBeNull(); | ||
| }); |
There was a problem hiding this comment.
Test title references a non-existent status code
The test description says "overloaded status from a decimal price containing 529", but 529 is not a code covered by any entry in ERROR_STATUS_MATCHERS, so the inline explanation is misleading for future maintainers. The test correctly returns null today because no pattern targets 529, but the comment implies an existing risk that doesn't exist yet — the real protection being exercised here is against patterns in the database-stored default rules (covered by the second test file). Rewording to "does not infer any error status from a decimal price whose digits contain a known error-code substring" would be more accurate.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/lib/upstream-error-detection-status.test.ts
Line: 90-94
Comment:
**Test title references a non-existent status code**
The test description says "overloaded status from a decimal price containing 529", but `529` is not a code covered by any entry in `ERROR_STATUS_MATCHERS`, so the inline explanation is misleading for future maintainers. The test correctly returns `null` today because no pattern targets `529`, but the comment implies an existing risk that doesn't exist yet — the real protection being exercised here is against patterns in the *database-stored* default rules (covered by the second test file). Rewording to "does not infer any error status from a decimal price whose digits contain a known error-code substring" would be more accurate.
How can I resolve this? If you propose a fix, please make it concise.| statusCode: 429, | ||
| matcherId: "rate_limit", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429\b|\b429\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015\b|超出频率|请求过于频繁|限流|稍后重试)/iu, | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?!\d|\.\d)|超出频率|请求过于频繁|限流|稍后重试)/iu, |
There was a problem hiding this comment.
Asymmetric left-boundary guard for the standalone
429 alternative
The 429 regex is the only matcher that adds a left-side lookbehind ((?<![\d.])) on the standalone status-number alternative, while all other status codes rely solely on the \bHTTP\/\d… prefix (which implicitly anchors the left side). The 429 too many requests phrase is indeed unique in that it can appear without the HTTP prefix, but \b429 would already prevent matching when the preceding character is a digit. The additional (?<![\d.]) guard is stricter: it also blocks a preceding literal ., which \b would allow (e.g., $0.429 too many requests).
This is likely intentional and correct, but it's the only case with this extra guard and worth a brief inline comment explaining why, so the asymmetry doesn't look like an oversight to future editors.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 98
Comment:
**Asymmetric left-boundary guard for the standalone `429` alternative**
The 429 regex is the only matcher that adds a left-side lookbehind (`(?<![\d.])`) on the standalone status-number alternative, while all other status codes rely solely on the `\bHTTP\/\d…` prefix (which implicitly anchors the left side). The `429 too many requests` phrase is indeed unique in that it can appear without the HTTP prefix, but `\b429` would already prevent matching when the preceding character is a digit. The additional `(?<![\d.])` guard is stricter: it also blocks a preceding literal `.`, which `\b` would allow (e.g., `$0.429 too many requests`).
This is likely intentional and correct, but it's the only case with this extra guard and worth a brief inline comment explaining why, so the asymmetry doesn't look like an oversight to future editors.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| matcherId: "legal_restriction", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+451\b|\bunavailable\s+for\s+legal\s+reasons\b|\bexport\s+control\b|\bsanctions?\b|法律原因不可用|合规限制|出口管制)/iu, | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+451(?!\d|\.\d)|\bunavailable\s+for\s+legal\s+reasons\b|\bexport\s+control\b|\bsanctions?\b|法律原因不可用|合规限制|出口管制)/iu, |
There was a problem hiding this comment.
(?!\d|\.\d) does not guard against a trailing bare period
The negative lookahead (?!\d|\.\d) prevents 451.7 (. + digit) from matching, but HTTP/1.1 451. (period followed by a non-digit, e.g. end-of-sentence) would still match. The same applies to every matcher in the array. In practice the sentence-ending period case ("...received 451.") is harmless and the old \b had the same behaviour, so this is not a regression — just worth noting that the fix is specifically scoped to decimal-fraction false positives, not bare trailing dots.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 147-148
Comment:
**`(?!\d|\.\d)` does not guard against a trailing bare period**
The negative lookahead `(?!\d|\.\d)` prevents `451.7` (`.` + digit) from matching, but `HTTP/1.1 451.` (period followed by a non-digit, e.g. end-of-sentence) would still match. The same applies to every matcher in the array. In practice the sentence-ending period case (`"...received 451."`) is harmless and the old `\b` had the same behaviour, so this is not a regression — just worth noting that the fix is specifically scoped to decimal-fraction false positives, not bare trailing dots.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/unit/repository/error-rules-default-numeric-boundaries.test.ts (2)
6-10:CapturedDefaultRule类型相对实际写入字段偏窄,建议明确收窄意图或补全字段。
syncDefaultErrorRules在tx.insert(errorRules).values(rule)时会传入完整 rule(priority、overrideResponse、isEnabled、isDefault等),但这里的捕获类型只声明了pattern / matchType / category三个字段,运行时虽然不会出问题,但读起来像是在断言"只传这三个字段"。如果意图就是"本测试只关心这三字段",可以在类型注释里点一句;如果担心未来字段被误改,可补完类型让 TS 在
tx.insert(...).values(rule)形态变化时给出反馈。当前写法可接受,仅作 nit。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/repository/error-rules-default-numeric-boundaries.test.ts` around lines 6 - 10, CapturedDefaultRule is too narrow compared to the actual object passed into syncDefaultErrorRules (tx.insert(errorRules).values(rule))—either document that the test intentionally only asserts pattern/matchType/category or expand CapturedDefaultRule to include the other real fields (e.g., priority, overrideResponse, isEnabled, isDefault) so TypeScript will catch future shape changes; update the type declaration for CapturedDefaultRule (or add an inline comment above it) and ensure it matches the shape used by tx.insert(...).values(rule).
113-118: 可选优化:在 flatMap 外层一次性编译 RegExp。
new RegExp(rule.pattern, "i")在 flatMap 内每条规则重新构造,对当前样本量级(默认规则数 × 2 个样本)无可见性能影响,但可读性上把"编译"和"测试"分开会更清晰;同时也方便后续把样本扩展到更多场景:♻️ 可选 refactor
- const accidentalMatches = regexRules.flatMap((rule) => { - const pattern = new RegExp(rule.pattern, "i"); - return samples - .filter((sample) => pattern.test(sample)) - .map((sample) => ({ category: rule.category, pattern: rule.pattern, sample })); - }); + const compiled = regexRules.map((rule) => ({ ...rule, regex: new RegExp(rule.pattern, "i") })); + const accidentalMatches = compiled.flatMap((rule) => + samples + .filter((sample) => rule.regex.test(sample)) + .map((sample) => ({ category: rule.category, pattern: rule.pattern, sample })), + );注:静态分析对该行报了 ReDoS(CWE-1333)。这里
rule.pattern来源于仓库内置的DEFAULT_ERROR_RULES,不是外部输入,且仅在单测里编译,不构成真实的 ReDoS 风险,可以忽略该告警。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/repository/error-rules-default-numeric-boundaries.test.ts` around lines 113 - 118, The code repeatedly constructs RegExp inside the flatMap for each rule; extract compilation by mapping regexRules to a new array of objects like { category, pattern, regex: new RegExp(pattern, "i") } (referencing regexRules, accidentalMatches, samples and pattern) and then use that compiled regex in the flatMap/test loop so RegExp instantiation happens once per rule and the testing step is separated from compilation.src/lib/utils/upstream-error-detection.ts (2)
98-98: 429 分支的边界改写存在一处副作用,建议确认是否符合预期。第二分支由(推测原状)
\b429\s+too\s+many\s+requests\b改为(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b后:
- 对
.429 too many requests这类小数前缀场景比\b更严格(符合 PR 目标);- 但对
user429 too many requests这类字母直接相连的场景反而比\b更宽松(\b在两个 word 字符之间不成立而拒绝匹配;(?<![\d.])因为前一字符是字母而通过)。实际上游错误文本里
字母+429+空格+too many requests的概率应该极低,且本分支仍然要求too many requests关键词共现,误判风险有限。如果你希望同时覆盖这两类场景,可以把锚定改为(?<![\w.])429(?!\d|\.\d),让数字、点号、字母三类前缀都被排除:🛡️ 可选加固(同时排除字母前缀)
- re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|... + re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\w.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/upstream-error-detection.ts` at line 98, The 429-branch regex in upstream-error-detection.ts tightened the left boundary from \b to (?<![\d.]) which still allows letter-prefixed matches (e.g., "user429 too many requests"); update that lookbehind to also exclude word characters by using (?<![\w.]) for the specific subpattern that currently reads (?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b so it rejects prefixes of digits, dots, or letters while preserving the intended stricter numeric/dot exclusion behavior.
94-170: 数字边界收紧实现正确,逻辑符合预期。
(?!\d|\.\d)在所有HTTP/<version> <status>与Error <code>分支上有效阻止了503.12/52312之类的延伸匹配;429 分支额外加了(?<![\d.])429来防止小数前缀(如.429)误命中,方向正确。可选的小重构:
(?!\d|\.\d)在语义上等价于更紧凑的(?!\.?\d),每条规则可少一个分支节点,可读性也略好;如果倾向于显式分写两种情况,保持现状也完全可以。例如对 429 行:♻️ 可选简化(保留行为不变)
- re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\d|\.\d)|(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?!\d|\.\d)|超出频率|请求过于频繁|限流|稍后重试)/iu, + re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?!\.?\d)|(?<![\d.])429(?!\.?\d)\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?!\.?\d)|超出频率|请求过于频繁|限流|稍后重试)/iu,其余 14 条规则同理可统一替换。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/upstream-error-detection.ts` around lines 94 - 170, The negative lookahead (?!\d|\.\d) used in the HTTP/<version> <status> and Error <code> branches can be simplified to the equivalent, more compact pattern (?!\.?\d) for improved readability; update the RegExp literals inside ERROR_STATUS_MATCHERS (e.g., entries with matcherId "rate_limit", "payment_required", "unauthorized", "forbidden", "not_found", "payload_too_large", "unsupported_media_type", "conflict", "unprocessable_entity", "request_timeout", "legal_restriction", "service_unavailable", "gateway_timeout", "internal_server_error", "bad_request") by replacing occurrences of (?!\d|\.\d) with (?!\.?\d) so behavior remains identical but regexes are slightly simpler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/repository/error-rules-default-numeric-boundaries.test.ts`:
- Around line 109-121: The test "default regex rules do not match numeric
substrings in request ids or prices" currently only asserts accidentalMatches is
empty which can yield a false positive if no regex rules were loaded; after
computing regexRules (from loadDefaultRules()) add an assertion that
regexRules.length is greater than 0 (e.g.,
expect(regexRules.length).toBeGreaterThan(0)) to ensure at least one regex rule
from DEFAULT_ERROR_RULES was compiled and actually exercised before checking
accidentalMatches.
---
Nitpick comments:
In `@src/lib/utils/upstream-error-detection.ts`:
- Line 98: The 429-branch regex in upstream-error-detection.ts tightened the
left boundary from \b to (?<![\d.]) which still allows letter-prefixed matches
(e.g., "user429 too many requests"); update that lookbehind to also exclude word
characters by using (?<![\w.]) for the specific subpattern that currently reads
(?<![\d.])429(?!\d|\.\d)\s+too\s+many\s+requests\b so it rejects prefixes of
digits, dots, or letters while preserving the intended stricter numeric/dot
exclusion behavior.
- Around line 94-170: The negative lookahead (?!\d|\.\d) used in the
HTTP/<version> <status> and Error <code> branches can be simplified to the
equivalent, more compact pattern (?!\.?\d) for improved readability; update the
RegExp literals inside ERROR_STATUS_MATCHERS (e.g., entries with matcherId
"rate_limit", "payment_required", "unauthorized", "forbidden", "not_found",
"payload_too_large", "unsupported_media_type", "conflict",
"unprocessable_entity", "request_timeout", "legal_restriction",
"service_unavailable", "gateway_timeout", "internal_server_error",
"bad_request") by replacing occurrences of (?!\d|\.\d) with (?!\.?\d) so
behavior remains identical but regexes are slightly simpler.
In `@tests/unit/repository/error-rules-default-numeric-boundaries.test.ts`:
- Around line 6-10: CapturedDefaultRule is too narrow compared to the actual
object passed into syncDefaultErrorRules
(tx.insert(errorRules).values(rule))—either document that the test intentionally
only asserts pattern/matchType/category or expand CapturedDefaultRule to include
the other real fields (e.g., priority, overrideResponse, isEnabled, isDefault)
so TypeScript will catch future shape changes; update the type declaration for
CapturedDefaultRule (or add an inline comment above it) and ensure it matches
the shape used by tx.insert(...).values(rule).
- Around line 113-118: The code repeatedly constructs RegExp inside the flatMap
for each rule; extract compilation by mapping regexRules to a new array of
objects like { category, pattern, regex: new RegExp(pattern, "i") } (referencing
regexRules, accidentalMatches, samples and pattern) and then use that compiled
regex in the flatMap/test loop so RegExp instantiation happens once per rule and
the testing step is separated from compilation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fd59a1d6-f599-409c-b44b-c3acbfd8284d
📒 Files selected for processing (3)
src/lib/utils/upstream-error-detection.tstests/unit/lib/upstream-error-detection-status.test.tstests/unit/repository/error-rules-default-numeric-boundaries.test.ts
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR.
PR Size: M
- Lines changed: 247
- Files changed: 3
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Adequate
- Code clarity - Good
Automated review by Codex AI
There was a problem hiding this comment.
Reviewed PR #1104 (fix(proxy): tighten numeric error status boundaries).
- PR Size:
M(247 lines changed across 3 files); applied labelsize/M. - Inline comments: none (no diff-line issues met the reporting threshold).
- Review summary: submitted via
gh pr review --commenton PR#1104.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd1fb2e479
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| statusCode: 429, | ||
| matcherId: "rate_limit", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429\b|\b429\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015\b|超出频率|请求过于频繁|限流|稍后重试)/iu, | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?![\p{L}\p{N}_.])|(?<![\p{L}\p{N}_.])429(?![\p{L}\p{N}_.])\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?![\p{L}\p{N}_.])|超出频率|请求过于频繁|限流|稍后重试)/iu, |
There was a problem hiding this comment.
Permit sentence punctuation after status tokens
Changing the trailing boundary from \b to (?![\p{L}\p{N}_.]) now rejects status tokens followed by a period, so bodies like "upstream returned HTTP/1.1 503." (or "Error 521.") no longer infer the upstream status and fall back to 502. This is a regression from the previous matcher behavior and can reduce the accuracy of fake-200 failure classification when providers/loggers end sentences with punctuation; the decimal false-positive case can be blocked without dropping bare punctuation matches.
Useful? React with 👍 / 👎.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/repository/error-rules-default-numeric-boundaries.test.ts (1)
119-119: 可选:扩展样本以覆盖更多潜在数字边界场景。当前只用两个样本(请求 ID 与价格小数)做回归,覆盖了 PR 提到的两类已知问题。若希望对未来 pattern 调整提供更强的护栏,可补充几个临界样本(如纯数字前后紧邻字母/标点、
HTTP/1.1、v429、5031等),进一步约束(?!\d|\.\d)与(?<![\d.])边界的覆盖范围。当前实现不阻塞合并。♻️ 可选样本扩充
- const samples = ["request id: 202604250550399959", "需要预扣费额度:¥0.352942"]; + const samples = [ + "request id: 202604250550399959", + "需要预扣费额度:¥0.352942", + "price: 503.12 USD", + "version v429 build 5031", + "trace=4290abc", + ];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/repository/error-rules-default-numeric-boundaries.test.ts` at line 119, 当前 tests/unit/repository/error-rules-default-numeric-boundaries.test.ts 中 samples 数组只包含两个样本,建议在 samples(变量名 samples)中增加更多边界样例以覆盖 `(?!\d|\.\d)` 与 `(?<![\d.])` 这类正则断言的临界场景;具体新增示例可包括纯数字与字母相邻(如 "v429"、"5031")、带小数点的协议/版本(如 "HTTP/1.1")、数字前后紧邻标点或字母(如 "id123abc"、"123,456"、"price:¥0.35")等,以更全面验证 Regex 策略(不阻塞合并,属于可选增强)。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/unit/repository/error-rules-default-numeric-boundaries.test.ts`:
- Line 119: 当前
tests/unit/repository/error-rules-default-numeric-boundaries.test.ts 中 samples
数组只包含两个样本,建议在 samples(变量名 samples)中增加更多边界样例以覆盖 `(?!\d|\.\d)` 与 `(?<![\d.])`
这类正则断言的临界场景;具体新增示例可包括纯数字与字母相邻(如 "v429"、"5031")、带小数点的协议/版本(如
"HTTP/1.1")、数字前后紧邻标点或字母(如 "id123abc"、"123,456"、"price:¥0.35")等,以更全面验证 Regex
策略(不阻塞合并,属于可选增强)。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ff1c1392-27e8-4463-ab59-0c4ea65bbcdd
📒 Files selected for processing (1)
tests/unit/repository/error-rules-default-numeric-boundaries.test.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b73501014
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| statusCode: 429, | ||
| matcherId: "rate_limit", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429\b|\b429\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015\b|超出频率|请求过于频繁|限流|稍后重试)/iu, | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?![\p{L}\p{N}_.])|(?<![\p{L}\p{N}_.])429(?![\p{L}\p{N}_.])\s+too\s+many\s+requests\b|\btoo\s+many\s+requests\b|\brate\s*limit(?:ed|ing)?\b|\bthrottl(?:e|ed|ing)\b|\bretry-after\b|\bRESOURCE_EXHAUSTED\b|\bRequestLimitExceeded\b|\bThrottling(?:Exception)?\b|\bError\s*1015(?![\p{L}\p{N}_.])|超出频率|请求过于频繁|限流|稍后重试)/iu, |
There was a problem hiding this comment.
Allow sentence punctuation after numeric status tokens
The new boundary check (?![\p{L}\p{N}_.]) blocks a trailing . after matched numeric tokens, so inputs like HTTP/1.1 503. and Error 521. now fail status inference even though they are standalone status/code tokens followed by sentence punctuation. In fake-200 handling this falls back to the generic 502 path and reduces classification accuracy for provider messages or logs that end with periods. You can keep the decimal/embedded-number guard while still permitting punctuation-only terminators.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/utils/upstream-error-detection.ts (1)
94-170: 正则边界收紧实现正确,建议抽取重复的负向先行模式以便集中维护。负向先行
(?![\p{L}\p{N}_]|\.\d)与 429 标准独立分支的负向后行(?<![\p{L}\p{N}_.])设计是合理的——前者拒绝紧邻的标记字符或小数后缀(同时允许句尾.),后者拒绝前置的数字 / 下划线 / 小数点(前置.一定是小数上下文)。15 个 HTTP 状态码与 5 个 Cloudflare 错误码全部覆盖,配合/iu标志即可使用\p{L}\p{N}Unicode 字符类。唯一可优化点:
(?![\p{L}\p{N}_]|\.\d)字面在 20 处出现,若以后需要调整边界语义(如增加对十六进制字符的拒绝、放开下划线等),需要同步修改全部 20 处,存在漏改风险。♻️ 可选:抽取边界片段为常量,使用 RegExp 构造器拼装
+// 数字 token 后的边界:拒绝紧邻的字母/数字/下划线,以及小数后缀(如 .12); +// 允许句尾 `.`、空白、标点等正常分隔符。统一在此调整以保持 20 处规则一致。 +const NUMERIC_TOKEN_TAIL_BOUNDARY = "(?![\\p{L}\\p{N}_]|\\.\\d)"; +const NB = NUMERIC_TOKEN_TAIL_BOUNDARY; + const ERROR_STATUS_MATCHERS: Array<{ statusCode: number; matcherId: string; re: RegExp }> = [ { statusCode: 429, matcherId: "rate_limit", - re: /(?:\bHTTP\/\d(?:\.\d)?\s+429(?![\p{L}\p{N}_]|\.\d)|(?<![\p{L}\p{N}_.])429(?![\p{L}\p{N}_]|\.\d)\s+too\s+many\s+requests\b|...|\bError\s*1015(?![\p{L}\p{N}_]|\.\d)|超出频率|...)/iu, + re: new RegExp( + `(?:\\bHTTP\\/\\d(?:\\.\\d)?\\s+429${NB}|(?<![\\p{L}\\p{N}_.])429${NB}\\s+too\\s+many\\s+requests\\b|...|\\bError\\s*1015${NB}|超出频率|...)`, + "iu", + ), }, // ... 其余 14 条同理 ];如果不希望引入字符串拼装的可读性成本,保留现状也是可以接受的——这只是降低未来维护误差的预防性建议。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/utils/upstream-error-detection.ts` around lines 94 - 170, The repeated negative-lookahead boundary fragment "(?![\\p{L}\\p{N}_]|\\.\\d)" used across ERROR_STATUS_MATCHERS should be extracted to a single constant (e.g., NEGATIVE_TRAILING_BOUNDARY) and the regexes should be constructed by composing that constant (via RegExp constructor or string template assembly) instead of repeating the literal in each entry; locate the ERROR_STATUS_MATCHERS array and replace the inline fragment in each object's re with a composed regex built from the shared constant so future updates to boundary semantics only require one change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/lib/utils/upstream-error-detection.ts`:
- Around line 94-170: The repeated negative-lookahead boundary fragment
"(?![\\p{L}\\p{N}_]|\\.\\d)" used across ERROR_STATUS_MATCHERS should be
extracted to a single constant (e.g., NEGATIVE_TRAILING_BOUNDARY) and the
regexes should be constructed by composing that constant (via RegExp constructor
or string template assembly) instead of repeating the literal in each entry;
locate the ERROR_STATUS_MATCHERS array and replace the inline fragment in each
object's re with a composed regex built from the shared constant so future
updates to boundary semantics only require one change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3acab3d2-f847-4c0e-8d74-69b4027c35cf
📒 Files selected for processing (2)
src/lib/utils/upstream-error-detection.tstests/unit/lib/upstream-error-detection-status.test.ts
Summary
Tighten the built-in
ERROR_STATUS_MATCHERSregexes and guard the default error-rule catalog against false-positive numeric substring matches. Numeric HTTP status tokens (e.g.503) and Cloudflare Error codes (e.g.521,524) previously matched inside longer digit sequences such as AWS request IDs (...05503999959...) or decimal prices (0.352942), causing incorrect upstream-error-status inference.Related:
ERROR_STATUS_MATCHERSregexes introduced in the fake-200 detection featureProblem
When an upstream returns a fake-200 response body containing a request ID or price, the numeric HTTP status regexes in
ERROR_STATUS_MATCHERScould match a numeric substring and infer the wrong error status code. For example:request id: 202604250550399959contains503at positions 13-15, causingservice_unavailableinference需要预扣费额度:¥0.352942contains529inside the decimal price, potentially triggering rate-limit inferenceThe root cause was that the status-code alternations used only
\bword boundaries, which allow a match when the digits are followed by more digits or a decimal point (e.g.503.12,521.7).Solution
Replace bare
\bword-boundary assertions on numeric status tokens with negative lookaheads(?\!\d|\.\d)that reject matches followed by additional digits or decimal fractions. Apply the same boundary tightening to Cloudflare Error codes (1015,1020,521,522,524). For the429HTTP status alternation, also add a negative lookbehind(?<\![\d.])to prevent matching when preceded by digits or a decimal point (e.g. inside a price).Changes
Core Changes
src/lib/utils/upstream-error-detection.ts(+15/-15) - Replace\bboundaries with(?\!\d|\.\d)on all 15 HTTP status code alternations and 5 Cloudflare Error code alternations; add(?<\![\d.])lookbehind on the429standalone status alternationTest Coverage
tests/unit/lib/upstream-error-detection-status.test.ts(new, 95 lines) - 63 test cases covering: standalone HTTP status matches (still work), decimal-suffix rejection (503.12), longer-number rejection (50312), Cloudflare Error standalone matches, Cloudflare decimal/longer-number rejection, and regression tests for the reported request-id and price samplestests/unit/repository/error-rules-default-numeric-boundaries.test.ts(new, 122 lines) - Validates that all defaultregexerror rules loaded bysyncDefaultErrorRules()do not accidentally match the reported request-id and price samplesVerification
bunx vitest run tests/unit/lib/upstream-error-detection-status.test.ts tests/unit/repository/error-rules-default-numeric-boundaries.test.ts-> 2 files / 63 tests passedbun run lint:fix-> exit 0; fixed formatting onlybun run lint-> exit 0; 3 existing info-level suggestions in unrelated leaderboard testsbun run typecheck-> exit 0bun run build-> exit 0bunx vitest run tests/unit/public-status/config-publisher.test.ts-> exit 0 after investigating the full-suite timeoutVITEST_MAX_WORKERS=4 bun run test-> 547 files / 5136 tests passed, 13 skippedNotes
bun run testhit an unrelated 20s timeout intests/unit/public-status/config-publisher.test.ts; the same test passes alone in ~15s and the full suite passes with reduced workers, matching the repo's configurable worker guidance.Description enhanced by Claude AI
Greptile Summary
This PR tightens the numeric boundary guards in
ERROR_STATUS_MATCHERSby replacing trailing\bword-boundary assertions with(?![\p{L}\p{N}_]|\.\d)negative lookaheads on all 15 HTTP status alternations and 5 Cloudflare Error code alternations, and adds a(?<![\p{L}\p{N}_.])lookbehind on the standalone429alternative. Two new test files provide 63 parametrised unit tests for the regex changes and a mock-based integration check against the default DB rules.Confidence Score: 5/5
Safe to merge — the regex changes are mechanically uniform, fully backward-compatible, and backed by comprehensive parametrised tests.
No P0 or P1 findings. The single P2 (missing
uflag in test helper) is latent and harmless while DB rules remain free of Unicode property escapes. Core logic in the production file is correct.tests/unit/repository/error-rules-default-numeric-boundaries.test.ts — the
matchesRulehelper usesnew RegExp(pattern, "i")withoutu; benign today but could mask failures if DB rules adopt\p{…}patterns in the future.Important Files Changed
\bwith `(?![\p{L}\p{N}_]syncDefaultErrorRules()via mocks and asserts no default rule matches the two reported false-positive samples. Minor:matchesRulecompiles regexes with only the"i"flag, missing"u", which would cause a SyntaxError if any future DB rule uses Unicode property escapes.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["Upstream fake-200 response body text"] --> B["inferUpstreamErrorStatusCodeFromText()"] B --> C["Trim + BOM strip + truncate to 64 KB"] C --> D["Iterate ERROR_STATUS_MATCHERS"] D --> E{"matcher.re.test(text)"} E -- "match" --> F["Return { statusCode, matcherId }"] E -- "no match" --> G{"More matchers?"} G -- "yes" --> D G -- "no" --> H["Return null"]Prompt To Fix All With AI
Reviews (4): Last reviewed commit: "fix(proxy): allow punctuation after nume..." | Re-trigger Greptile