修复可用性监控加载过慢#1186
Conversation
📝 WalkthroughWalkthrough将可用性服务的“终态”判定从调用数据库函数替换为直接检查 变更清单终态判定逻辑简化
估算代码审查工作量🎯 3 (Moderate) | ⏱️ ~20 minutes 可能相关的 PR
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request simplifies the logic for identifying finalized message requests by replacing a complex SQL function with a check for a non-null status code. This change, applied in the availability service and verified in unit tests, aims to prevent misclassifying active requests as failures. Feedback suggests that while this improves query performance by utilizing database indices, it may inadvertently exclude certain early-stage failures like DNS or connection timeouts, potentially leading to an overestimation of availability metrics.
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR. The change is a well-reasoned performance fix that restores partial index utilization by aligning the availability query's finalized condition with the existing status_code IS NOT NULL partial index predicate. The rationale for excluding intermediate states (where providerChain/errorMessage may be set but statusCode is still null) is clearly documented in the updated comment.
PR Size: XS
- Lines changed: 26
- Files changed: 2
Review Coverage
- Logic and correctness - Clean. The change correctly narrows the finalized boundary to
statusCode IS NOT NULL, which both matches the partial index predicate and avoids misclassifying in-progress requests as failed. - Security (OWASP Top 10) - Clean. No security-relevant changes; continues using Drizzle ORM parameterized SQL.
- Error handling - Clean. No new error paths introduced.
- Type safety - Clean.
isNotNullimport and usage are type-safe. - Documentation accuracy - Clean. The updated JSDoc comment accurately explains the rationale and tradeoffs.
- Test coverage - Adequate. All 5 SQL assertion tests updated to match the new predicate, plus an additional assertion added in the Gemini passthrough test.
- Code clarity - Good. The function is simplified from a multi-parameter SQL function call to a single Drizzle helper.
Automated review by Claude AI
|
补充状态:
|
|
已按 Greptile 的建议补充一轮 follow-up:
请再跑一轮 AI review。 @coderabbitai review |
|
正在触发新一轮 AI review,请稍候。 ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/lib/availability-service.test.ts (1)
42-59: 💤 Low value建议增强输入验证以防御极端情况。
虽然当前实现中
cteName已通过正则转义处理,且实际调用时仅使用硬编码的 CTE 名称("finalized_requests"、"provider_bucket_stats"),但静态分析工具仍标记了潜在的 ReDoS 风险。考虑添加输入约束验证以明确防御边界并消除静态分析警告。防御性验证建议
function findCteBoundary( queryText: string, cteName: string ): { cteStart: number; boundaryStart: number } | null { + if (!/^[a-z_][a-z0-9_]*$/i.test(cteName)) { + throw new Error(`Invalid CTE name: ${cteName}`); + } const escapedName = cteName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); const match = new RegExp(`(?:^|\\bwith\\s+|,\\s*)("?${escapedName}"?)\\s+as\\s*\\(`).exec(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/lib/availability-service.test.ts` around lines 42 - 59, The findCteBoundary function currently builds a dynamic regex from cteName; to defend against ReDoS add defensive validation up-front: validate queryText is a non-empty string and validate cteName is short (e.g. max length 100) and only contains safe characters (alphanumerics, underscore, optionally quoted double quotes), returning null or throwing if validation fails; after those checks continue to construct escapedName and run the existing RegExp logic. This keeps the existing behavior (use of escapedName, match extraction and returned cteStart/boundaryStart) but prevents long/complex inputs from being used in the regex and removes static-analysis warnings about ReDoS.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/lib/availability-service.test.ts`:
- Around line 42-59: The findCteBoundary function currently builds a dynamic
regex from cteName; to defend against ReDoS add defensive validation up-front:
validate queryText is a non-empty string and validate cteName is short (e.g. max
length 100) and only contains safe characters (alphanumerics, underscore,
optionally quoted double quotes), returning null or throwing if validation
fails; after those checks continue to construct escapedName and run the existing
RegExp logic. This keeps the existing behavior (use of escapedName, match
extraction and returned cteStart/boundaryStart) but prevents long/complex inputs
from being used in the regex and removes static-analysis warnings about ReDoS.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b865efba-6a96-4187-8be3-d415093fb74d
📒 Files selected for processing (1)
tests/unit/lib/availability-service.test.ts
There was a problem hiding this comment.
Code Review
This pull request simplifies the availability finalized condition by replacing a SQL function call with a direct null check on statusCode, which improves query performance by allowing the use of partial indexes. The unit tests have been updated to reflect this change and introduce a new findCteBoundary helper for more reliable SQL parsing. Feedback indicates that the implementation of this helper is somewhat fragile and could be improved by using capturing groups or the RegExp d flag to more robustly determine the start of the CTE name.
|
已继续处理第二轮 AI review:
请再跑一轮 AI review。 @coderabbitai review |
|
正在触发新一轮完整 AI review,请稍候。 ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Code Review
This pull request simplifies the availability monitoring logic by replacing a complex SQL function with a direct check for non-null status codes to improve database performance. It also enhances the test suite with a more robust helper for identifying SQL CTE boundaries. A review comment identifies a potential issue where this change might exclude network-level failures from monitoring if they lack a status code, which could lead to inaccurate availability metrics.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/unit/lib/availability-service.test.ts (2)
42-63: 💤 Low value输入验证有效防御了 ReDoS 风险,但建议添加注释说明。
静态分析工具标记了 50 行从变量构造正则表达式的潜在 ReDoS 风险。当前实现通过以下措施有效缓解了该风险:
- 长度限制:46 行限制
cteName最大 100 字符- 格式验证:46 行确保输入符合 SQL 标识符模式
[a-z_][a-z0-9_]*- 特殊字符转义:50 行正确转义了所有正则元字符
- 受控调用:实际仅在 66-67 行用硬编码字符串调用
虽然当前代码安全,但建议在函数开头添加注释明确说明这些安全措施,便于后续维护者理解设计意图。
建议添加的注释
function findCteBoundary( queryText: string, cteName: string ): { cteStart: number; boundaryStart: number } | null { + // Security: Input validation below prevents ReDoS attacks by constraining + // cteName length and format before using it in regex construction. if (queryText.length === 0 || cteName.length > 100 || !/^[a-z_][a-z0-9_]*$/i.test(cteName)) {Based on learnings: 静态分析工具标记从变量构造正则表达式存在 ReDoS 风险(CWE-1333),需要通过输入验证或使用 recheck 等库来确保安全。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/lib/availability-service.test.ts` around lines 42 - 63, Add a short comment at the top of function findCteBoundary explaining why constructing a RegExp from cteName is safe: note the explicit input validation on cteName (length <= 100 and format check /^[a-z_][a-z0-9_]*$/i), the escaping step that produces escapedName via cteName.replace(...), and that the RegExp using escapedName is only used in this controlled context; reference the variables cteName, escapedName and the RegExp construction to make the rationale clear for future maintainers and static analyzers.
65-74: ⚡ Quick win改进错误消息以区分不同失败场景。
当前错误消息"Could not locate finalized_requests CTE in query text"无法区分以下三种失败情况:
finalized_requestsCTE 不存在provider_bucket_statsCTE 不存在- 两个 CTE 存在但顺序错误
建议根据
start和end的值提供更具体的错误消息,便于调试测试失败时快速定位原因。建议的改进
function extractFinalizedRequestsSql(queryText: string): string { const start = findCteBoundary(queryText, "finalized_requests"); const end = findCteBoundary(queryText, "provider_bucket_stats"); - if (!start || !end || end.boundaryStart <= start.cteStart) { - throw new Error("Could not locate finalized_requests CTE in query text"); + if (!start) { + throw new Error("Could not locate finalized_requests CTE in query text"); + } + if (!end) { + throw new Error("Could not locate provider_bucket_stats CTE in query text"); + } + if (end.boundaryStart <= start.cteStart) { + throw new Error("CTEs found in unexpected order or overlapping"); } return queryText.slice(start.cteStart, end.boundaryStart); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/lib/availability-service.test.ts` around lines 65 - 74, In extractFinalizedRequestsSql, improve the thrown Error to distinguish three failure cases: missing finalized_requests CTE (start is falsy), missing provider_bucket_stats CTE (end is falsy), and ordering issue (both present but end.boundaryStart <= start.cteStart). Update the function to check these conditions and throw distinct, descriptive messages that reference the CTE names (e.g., "finalized_requests CTE not found", "provider_bucket_stats CTE not found", "finalized_requests CTE appears after provider_bucket_stats CTE"), so test failures clearly indicate which scenario occurred.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/lib/availability-service.test.ts`:
- Around line 42-63: Add a short comment at the top of function findCteBoundary
explaining why constructing a RegExp from cteName is safe: note the explicit
input validation on cteName (length <= 100 and format check
/^[a-z_][a-z0-9_]*$/i), the escaping step that produces escapedName via
cteName.replace(...), and that the RegExp using escapedName is only used in this
controlled context; reference the variables cteName, escapedName and the RegExp
construction to make the rationale clear for future maintainers and static
analyzers.
- Around line 65-74: In extractFinalizedRequestsSql, improve the thrown Error to
distinguish three failure cases: missing finalized_requests CTE (start is
falsy), missing provider_bucket_stats CTE (end is falsy), and ordering issue
(both present but end.boundaryStart <= start.cteStart). Update the function to
check these conditions and throw distinct, descriptive messages that reference
the CTE names (e.g., "finalized_requests CTE not found", "provider_bucket_stats
CTE not found", "finalized_requests CTE appears after provider_bucket_stats
CTE"), so test failures clearly indicate which scenario occurred.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c1f8ac44-fc35-46c3-9270-46a80e13bc3b
📒 Files selected for processing (1)
tests/unit/lib/availability-service.test.ts
|
已处理第三轮 CodeRabbit 的两条 nitpick:
本轮验证:
请再跑一轮 AI review。 @coderabbitai review |
|
正在触发新一轮完整 AI review,请稍候。 ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Code Review
This pull request simplifies the availability monitoring logic by replacing the fn_is_message_request_finalized SQL function with a direct check for non-null status codes. Additionally, the unit tests were updated with a more robust findCteBoundary helper for parsing SQL strings. Feedback was provided to improve the findCteBoundary function by ensuring case-consistency between the input validation and the regex search to prevent potential mismatches.
| if (queryText.length === 0 || cteName.length > 100 || !/^[a-z_][a-z0-9_]*$/i.test(cteName)) { | ||
| return null; | ||
| } | ||
|
|
||
| const escapedName = cteName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); | ||
| const match = new RegExp(`(^|\\bwith\\s+|,\\s*)("?${escapedName}"?)\\s+as\\s*\\(`).exec( | ||
| queryText | ||
| ); |
There was a problem hiding this comment.
The findCteBoundary function performs a case-insensitive validation on cteName (using the /i flag on line 47), but the RegExp constructed on line 52 is case-sensitive. Since queryText is explicitly lowercased by normalizeSql, a cteName passed with uppercase characters would pass the initial validation but fail to match the query text. For consistency and robustness, the RegExp should also use the i flag, or cteName should be lowercased before constructing the regex.
| if (queryText.length === 0 || cteName.length > 100 || !/^[a-z_][a-z0-9_]*$/i.test(cteName)) { | |
| return null; | |
| } | |
| const escapedName = cteName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); | |
| const match = new RegExp(`(^|\\bwith\\s+|,\\s*)("?${escapedName}"?)\\s+as\\s*\\(`).exec( | |
| queryText | |
| ); | |
| if (queryText.length === 0 || cteName.length > 100 || !/^[a-z_][a-z0-9_]*$/i.test(cteName)) { | |
| return null; | |
| } | |
| const escapedName = cteName.toLowerCase().replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); | |
| const match = new RegExp("(^|\\bwith\\s+|,\\s*)(\"?" + escapedName + "\"?)\\s+as\\s*\\(", "i").exec( | |
| queryText | |
| ); |
Related Issues & PRs
fn_is_message_request_finalized函数调用使查询无法命中部分索引idx_message_request_provider_created_at_finalized_active。statusCode IS NOT NULL改为fn_is_message_request_finalized(...),引入了本次性能回归。statusCode IS NOT NULL终态边界并创建了对应的部分索引。背景
可用性监控的聚合查询加载很久。排查后确认热路径查询原本依赖
message_request上的部分索引:idx_message_request_provider_created_at_finalized_active该索引的谓词包含
deleted_at IS NULL AND status_code IS NOT NULL。近期查询条件改为fn_is_message_request_finalized(...)后,SQL 不再显式包含status_code IS NOT NULL,Postgres 难以使用这个部分索引,导致 provider + 时间范围聚合容易退化为大范围扫描。同时,可用性监控不能直接把 requesting 中已出现
providerChain/errorMessage片段的请求计为终态,否则会把请求中的请求误算成失败。因此这里恢复用status_code IS NOT NULL作为可用性监控的终态边界,继续使用fn_compute_message_request_success_rate_outcome(...)对终态记录做成功/失败/排除分类。改动
statusCode IS NOT NULL,重新对齐现有部分索引。status_code IS NOT NULL,并继续覆盖 outcome 分类函数。验证
bunx vitest run tests/unit/lib/availability-service.test.tsbunx biome check src/lib/availability/availability-service.ts tests/unit/lib/availability-service.test.tsbun run typecheckbun run lintbun run lint:fixbun run testbun run build说明:
bun run build通过,过程中仍会输出仓库既有的 Edge Runtime 警告。Greptile Summary
This PR restores
status_code IS NOT NULLas the finalized-request boundary in availability monitoring, replacing thefn_is_message_request_finalized(...)call that was preventing PostgreSQL from using the partial indexidx_message_request_provider_created_at_finalized_active, causing full-scan degradation on the hot aggregation path.availability-service.ts:buildAvailabilityFinalizedCondition()now returnsisNotNull(messageRequest.statusCode)(one line, uses the newisNotNullimport). Outcome classification (fn_compute_message_request_success_rate_outcome) is preserved and still drivesgreenCount/redCountviaFILTER (WHERE successRateOutcome IN ('success', 'failure')), so blocked/excluded records with a statusCode remain harmless.availability-service.test.ts:extractFinalizedRequestsSqlis hardened with a newfindCteBoundaryhelper that uses a regex anchored to CTE keyword boundaries instead of a fragileindexOf. Four test assertions are updated fromfn_is_message_request_finalizedto"status_code" is not null; one previously missing assertion is added to the Gemini passthrough test.Confidence Score: 5/5
Safe to merge — the change is a targeted one-line revert in the finalized condition, backed by updated tests that assert the correct SQL predicate appears in the generated query.
The core change is a single, well-scoped substitution: replacing a PL/pgSQL function call with a direct column null-check that the existing partial index was already designed to serve. Outcome classification is untouched and still controls which finalized records count as green or red. The test helper improvement is strictly additive and more correct than the previous indexOf-based extraction. No logic paths were removed or silently skipped.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant AvailService as AvailabilityService participant DB as PostgreSQL Client->>AvailService: queryProviderAvailability(options) AvailService->>DB: "SELECT providers WHERE isEnabled=true" DB-->>AvailService: [providerList] AvailService->>DB: WITH finalized_requests AS (SELECT ... FROM message_request WHERE status_code IS NOT NULL AND deleted_at IS NULL), provider_bucket_stats AS (SELECT ... COUNT/AVG/PERCENTILE via fn_compute_outcome) SELECT ... FROM limited_provider_bucket_stats DB-->>AvailService: [bucketRows] AvailService-->>Client: AvailabilityQueryResult note over AvailService,DB: Previously: WHERE fn_is_message_request_finalized(...) skipped partial index note over AvailService,DB: Now: WHERE status_code IS NOT NULL hits idx_message_request_provider_created_at_finalized_activeComments Outside Diff (1)
tests/unit/lib/availability-service.test.ts, line 42-51 (link)extractFinalizedRequestsSqlsilently passes if anchor text collidesextractFinalizedRequestsSqlslices between the first occurrence of"finalized_requests as"and"provider_bucket_stats as". If Drizzle ever quotes identifiers differently or the CTE names shift, bothindexOfcalls can return-1and the guard throws — but if only the second sentinel is missing while the first is present,end === -1is caught. More critically, if any literal string in the query accidentally contains"provider_bucket_stats as"earlier than the actual CTE definition, the slice would be silently truncated, causing assertions on the extracted snippet to produce false-positives. A more robust anchor using a regex with a CTE keyword boundary would make this helper more resilient to query text drift.Prompt To Fix With AI
Reviews (4): Last reviewed commit: "test: 优化 CTE 边界测试诊断" | Re-trigger Greptile