Skip to content

优化公开状态页 Redis 聚合与轮询性能#1211

Open
tesgth032 wants to merge 5 commits into
ding113:devfrom
tesgth032:fix/public-status-redesign
Open

优化公开状态页 Redis 聚合与轮询性能#1211
tesgth032 wants to merge 5 commits into
ding113:devfrom
tesgth032:fix/public-status-redesign

Conversation

@tesgth032
Copy link
Copy Markdown
Contributor

@tesgth032 tesgth032 commented May 19, 2026

相关 PR

背景

公开 /status 页面之前依赖按窗口从 message_request 回扫并重建投影。数据量上来后,这条路径会把公开状态页的刷新成本放大到"时间窗口 × 端点/请求历史 × 模型/分组",并且读侧轮询会反复拉完整 timeline,导致 Redis/DB/浏览器都承压。

本 PR 重新设计公开状态页的数据面:写入时增量写 Redis 5 分钟 rollup,后台 worker 只从 rollup 物化 public-safe snapshot,公开 API 只读 snapshot,不再扫 DB。

设计要点

  • 新增 public-status:v2 Redis namespace,保留 public-status:v1 投影作为升级期 fallback。
  • 新增 5 分钟底层 rollup hash:public-status:v2:rollup:5m:{bucketStart}
  • rollup field 维度为 {providerGroupId}|{publicModelKey}|metric,不包含 endpoint/internal provider URL,避免"端点 × 模型 × 时间"爆炸。
  • metric 使用 success/failure/ttfb_sum/ttfb_count/tps_sum/tps_count,worker 按公开配置的 5/15/30/60m 展示间隔合并。
  • 首次写入记录 public-status:v2:rollup:coverage-start:5m,窗口覆盖不完整时公开读侧优先服务 v1 snapshot,并标记 stale;没有 v1 时服务 v2 stale。
  • v2 rollup 写入与 worker 只使用 v2 internal config,不用 v1 config 继续生产新 rollup,避免升级期把旧配置语义写进新 namespace。
  • provider group 归因优先使用稳定 provider_groups.id,旧 internal snapshot 没有 id 时才 fallback 到 group name。
  • provider/model 低频新增会自然随 config publish 进入后续 rollup;provider group 删除后,新 config 不再读取其 rollup 字段,旧 rollup 由 TTL 清理。

前端性能

  • 状态页轮询改成 include=meta,defaults,groups,不再每 30 秒拉 timeline。
  • 如果轮询 payload 不含 timeline,复用本地已有 timeline 渲染图表;当前状态、可用率等摘要使用最新 payload,避免徽标滞后。
  • timeline 去掉每个格子的 Radix Tooltip 实例,改成单个 active summary,降低大量 DOM/portal 开销。

兼容与升级

  • Docker/普通升级不需要迁移数据库,也不需要手动清 Redis。
  • 新版本启动后会发布 v2 config,并开始写 v2 rollup;v2 覆盖完整前继续回退到 v1 公开 snapshot,避免状态页空窗。
  • 旧 v1 keys 不主动删除,保留给升级 fallback;v2 versioned config/snapshot/rollup 都有 TTL。
  • TTFB/TPS 口径从旧 DB 聚合的中位数改为 rollup 的 sum/count 均值,这是为了避免保留原始样本数组导致 Redis 体积和写入成本失控。

验证

  • bun run typecheck
  • bun run lint:fix
  • bun run lint
  • 定向状态页/rollup 测试:vitest.cmd run tests\\unit\\public-status\\public-status-view.test.tsx tests\\unit\\public-status\\rollup-store.test.ts tests\\unit\\public-status\\read-store.test.ts tests\\unit\\public-status\\rebuild-worker.test.ts tests\\unit\\repository\\message-public-status-rollup.test.ts
  • bun run build
  • bun run test

全量测试结果:684 个测试文件通过,6099 个测试通过,13 个跳过。构建通过;Edge Runtime / Redis 未配置 / DSN fallback 等日志为现有测试和构建环境噪音。


Description enhanced by Claude AI

Greptile Summary

This PR replaces the public-status read path's per-request DB back-scan with an incremental v2 Redis rollup architecture: 5-minute HINCRBYFLOAT rollup buckets written at request finalization, a background worker that materializes snapshots from those buckets, and a v1-fallback during the coverage warm-up window. The front-end polling is also optimized to omit per-bucket timeline data and reuse locally cached timelines between polls.

  • Backend: Adds rollup-store.ts (write/read/aggregate rollup buckets), extends rebuild-worker.ts to build projections from rollups instead of DB queries, and adds dual-namespace (v2 primary / v1 legacy fallback) support to read-store.ts, config-snapshot.ts, and redis-contract.ts.
  • Frontend: Polling requests now use include=meta,defaults,groups (no timeline), with client-side timeline reuse logic and a single shared hover summary replacing per-cell Radix Tooltip instances.

Confidence Score: 5/5

The PR is safe to merge. The rollup write path is fire-and-forget with dedup guards, the v1 fallback activates whenever v2 coverage is incomplete, and no database migrations are needed.

The core rollup pipeline — write, aggregate, serve — is internally consistent: bucket alignment uses the same alignBucketStartUtc constant throughout, the rollupCoverageComplete flag correctly gates the legacy fallback, and the frontend merge logic handles every edge case. The two noted issues are purely about code clarity, not runtime correctness.

No files require special attention. The read-store.ts fallback branch worth a comment is the only area that could confuse a future reader.

Important Files Changed

Filename Overview
src/lib/public-status/rollup-store.ts New file implementing the v2 rollup write/read/aggregate pipeline; bucketing, field encoding, coverage-start tracking, and gap-fill logic are correct
src/lib/public-status/rebuild-worker.ts Refactored to build projections from rollup buckets instead of DB queries; rollupCoverageComplete flag correctly gated, distributed lock flow unchanged
src/lib/public-status/read-store.ts Dual-namespace v2/v1 fallback logic added; the shared readProjection helper has an unreachable branch when configVersion is "current" (no-op but not wrong)
src/repository/message.ts Rollup write wired into createMessageRequest/updateMessageRequestDetails; process-local seed cache and finalization dedup look correct but multi-pod assumptions are undocumented
src/lib/public-status/redis-contract.ts Prefix promoted from v1 to v2, LEGACY constant added, rollup/coverage-start key builders added; alignBucketStartUtc usage is consistent
src/app/[locale]/status/_components/public-status-view.tsx Poll URL gains include=meta,defaults,groups; timeline reuse + ViewModelSnapshot merge logic is correct; deriveCurrentModelState fall-through to latestState works
src/app/[locale]/status/_components/public-status-timeline.tsx TooltipProvider replaced with a single active-summary panel; mouse/focus/blur handlers cover enter and leave correctly
src/lib/public-status/config-snapshot.ts Legacy fallback loop added to both readCurrentPublicStatusConfigSnapshot and readCurrentInternalPublicStatusConfigSnapshot; regex fix for config-internal key pattern is correct
src/lib/public-status/rebuild-hints.ts New pttl early-return optimization avoids redundant writes when a hint is already live; null/undefined/missing-function cases are all handled safely

Sequence Diagram

sequenceDiagram
    participant Req as Request Handler
    participant Msg as message.ts
    participant RS as rollup-store.ts
    participant Redis as Redis (v2)
    participant Worker as rebuild-worker.ts
    participant ReadStore as read-store.ts
    participant Client as Browser

    Req->>Msg: createMessageRequest()
    Msg->>Msg: rememberPublicStatusRequestSeed(id, seed)

    Req->>Msg: updateMessageRequestDetails(id, details)
    Msg->>RS: queuePublicStatusRollupWrite(event, groups)
    RS->>Redis: "HINCRBYFLOAT rollup:5m:{bucket} field value"
    RS->>Redis: SET coverage-start:5m (NX)
    RS->>Redis: EXPIRE keys (32 days)

    Worker->>Redis: GET coverage-start key
    Worker->>Redis: HGETALL rollup buckets (batched pipeline)
    Worker->>Worker: buildPublicStatusPayloadFromRollups()
    Worker->>Worker: isRollupCoverageComplete()
    Worker->>Redis: "SET snapshot:{gen}"
    Worker->>Redis: SET manifest:current

    Client->>ReadStore: readPublicStatusPayload()
    ReadStore->>Redis: GET manifest (v2)
    alt "rollupCoverageComplete === false"
        ReadStore->>Redis: GET manifest (v1 legacy)
        ReadStore-->>Client: "legacy projection (rebuildState=stale)"
    else coverage complete
        ReadStore->>Redis: GET snapshot (v2)
        ReadStore-->>Client: v2 projection
    end
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
src/lib/public-status/read-store.ts:227-253
**No-op `currentManifest` fallback when `configVersion` defaults to `"current"`**

When `readProjection` is called without an explicit `configVersion`, `manifestConfigVersion` is `"current"`, so `currentManifest` is assigned the same object reference as `manifest`. The branch that follows (`if (!resolution.sourceGeneration && currentManifest)`) then re-resolves the identical manifest and forces `rebuildState: "stale"`, but because `resolution.sourceGeneration` is still `null`, the function falls through to a `manifest-missing` miss regardless. The branch only produces a meaningful result when a specific `configVersion` is passed and the versioned manifest is unready while the `"current"` pointer still holds a valid generation. A comment clarifying this would prevent future readers from spending time tracing the no-op path.

### Issue 2 of 2
src/lib/public-status/rollup-store.ts:187-192
**`getPublicStatusGroupRollupIds` always returns a single-element array**

The function signature and variable name (`groupIds`, iterated with `for...of` in `buildPublicStatusPayloadFromRollups`) suggest multi-ID aggregation — e.g., reading a numeric-ID bucket _and_ a legacy name-based bucket simultaneously during a migration. In practice only the primary ID is ever returned, so the loop in the aggregation path is always a single iteration. If dual-key backward compatibility during provider-group ID migration is not planned, collapsing this to a single `string` return type would remove the misleading implication; if it is planned, a comment explaining the intended expansion would help.

Reviews (4): Last reviewed commit: "完善公开状态页轮询测试清理" | Re-trigger Greptile

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

迁移公共状态聚合到按时间桶 rollup 存储并引入 v2 Redis schema 与 legacy 回退;提取并复用模型厂商规则;更新前端时间线交互与轮询复用逻辑;在消息仓库添加 rollup 事件入队与去重缓存;大量单测覆盖。

Changes

Model vendor system extraction

Layer / File(s) Summary
Vendor rule types and matching
src/lib/model-vendor-rules.ts
新增 ModelVendorRule 常量表与 getModelVendor,实现基于前缀的最长匹配并在未匹配时返回 null。
Icon module refactor to use vendor rules
src/lib/model-vendor-icons.tsx
图标模块改为调用新规则模块的 getModelVendor,新增 MODEL_VENDOR_ICON_BY_KEY 映射;ModelVendorEntry 现在由规则类型组合并注入 icon;缺失图标在非 production 环境警告并回退为 OpenAI。
Vendor icon key resolution update
src/lib/public-status/vendor-icon-key.ts
更新 getModelVendor 的导入来源为新的规则模块。

Public status rollup aggregation system

Layer / File(s) Summary
Rollup types and aggregation core
src/lib/public-status/aggregation-core.ts
新增 PublicStatusConfiguredGroup 接口与 computeTokensPerSecond,作为聚合共享类型与工具。
Redis contract v2 schema with prefix and rollup support
src/lib/public-status/redis-contract.ts
将 PUBLIC_STATUS_REDIS_PREFIX 升级到 v2,新增 LEGACY_PUBLIC_STATUS_REDIS_PREFIX 与 PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES;manifest 增加 rollup 覆盖字段;key builder 支持可选 prefix;新增 rollup 与 coverage-start key 构建函数。
Rollup store read/write and aggregation logic
src/lib/public-status/rollup-store.ts
实现 rollup 字段编码/解码、增量生成、写入(支持 pipeline 与 coverage-start NX 写入)、批量读取、gap-fill 填充与基于桶的 payload 聚合(计算 availability/ttfb/tps/state 与 latestState)。
Config snapshot legacy fallback support
src/lib/public-status/config-snapshot.ts
快照读取器支持 allowLegacyFallback 选项;InternalPublicStatusGroupSnapshot 新增 sourceGroupId;实现按前缀的 v2→legacy 回退流程;更新 config key 正则以兼容 -internal 形式。
Aggregation module updated for extracted types
src/lib/public-status/aggregation.ts
将 PublicStatusConfiguredGroup 与 computeTokensPerSecond 移至 aggregation-core 并从 aggregation 重新导出;getConfiguredPublicStatusGroups 映射中新增 sourceGroupId;聚合中仅对 outcome === 'success' 的样本采集 ttfb/tps。
Config publisher sourceGroupId mapping
src/lib/public-status/config-publisher.ts
publishCurrentPublicStatusConfigProjection 计算 providerGroupIdByName 并在 internal snapshot groups 中填充 sourceGroupId。
Read-store refactoring to projection pattern
src/lib/public-status/read-store.ts
将 readPublicStatusPayload 拆为 readProjection 与 projectionToPayload;优先读取 v2 投影,rollup 覆盖未完成时尝试 legacy 回退并触发相应 rebuild hints;configVersion 不匹配或覆盖不完整返回 stale。
Rebuild worker migration to rollup-based aggregation
src/lib/public-status/rebuild-worker.ts
重建工作者使用 rollup buckets 替代基于请求的聚合:读取 rollup、计算 rollupCoverageStartedAt/rollupCoverageComplete/rollupSampleCount,并在发布 manifest 时写入这些字段;重建期间禁用 legacy 回退并基于 snapshot 派生分组。
Rebuild hints TTL pre-check optimization
src/lib/public-status/rebuild-hints.ts
schedulePublicStatusRebuild 在写入前检查现有 key 的 pttl,若 TTL 有效则跳过重写并直接返回 accepted 状态。
Scheduler pattern update for v2 prefix
src/lib/public-status/scheduler.ts
将 rebuild-hint 的扫描模式从硬编码 v1 前缀改为基于 PUBLIC_STATUS_REDIS_PREFIX 的模板拼接。

Message repository rollup event integration

Layer / File(s) Summary
Message request seed cache and event queueing
src/repository/message.ts
新增内存 request seed 与 finalized claim / in-flight 去重机制;在 create/update 路径写入/更新 seed;在 details 最终化时消费 seed(或回读 DB)组装 PublicStatusRollupEvent 并调用 queuePublicStatusRollupWrite,缺失/不可写或错误时记录 logger.warn;包含同步/异步写入路径兼容。

Frontend component updates for timeline state management

Layer / File(s) Summary
Status page notFound fix
src/app/[locale]/status/[slug]/page.tsx
将 notFound() 调用改为 return notFound(),确保 targetGroup 未找到时终止渲染分支。
Timeline component tooltip to internal state
src/app/[locale]/status/_components/public-status-timeline.tsx
替换 TooltipProvider/TooltipContent 为组件内 activeIndex 状态;按钮通过 onFocus/onMouseEnter 更新激活;onMouseLeave/onBlur(检查 relatedTarget 是否在容器内)清除激活;通过 useMemo 计算 activeSummary 并在时间线下方渲染统一汇总面板,缺失值显示为 —。
Status view model state and timeline reuse handling
src/app/[locale]/status/_components/public-status-view.tsx
新增 ViewModelSnapshot.timelineReusedFromPrevious 标记;deriveCurrentModelState 在复用时返回 latestState;轮询请求包含 include=meta,defaults,groups;payload 更新改为基于 previous 合并:当新 timeline 为空时复用上一轮 timeline 并标记复用;uptime24h 在复用且 availabilityPct 存在时直接使用,否则计算;latest 状态使用 deriveCurrentModelState。

Comprehensive test coverage for rollup system and updates

Layer / File(s) Summary
Rollup store behavior tests
tests/unit/public-status/rollup-store.test.ts
新增测试验证 increments 生成、客户端失败排除、单桶写入策略、pipeline 读取、聚合与可用性计算、降级状态检测、字段编码/解码、interval 防护等。
Message repository rollup integration tests
tests/unit/repository/message-public-status-rollup.test.ts
新增含 DB/logger/rollup 队列 mock 的测试,验证并发 details 更新仅触发一次队列并校验事件字段;验证缺 seed 时回读持久化种子并正确入队;验证 retryable 路径与配置不可用重试语义。
Config snapshot legacy fallback tests
tests/unit/public-status/config-snapshot.test.ts
新增接口与用例验证 readCurrentInternalPublicStatusConfigSnapshot 的 allowLegacyFallback 控制行为,确保 v2 rollup 写入器可禁用 legacy 回退。
Config publisher sourceGroupId tests
tests/unit/public-status/config-publisher.test.ts
更新测试断言,以验证 internal snapshot groups 包含 sourceGroupId。
Redis contract v2 key generation tests
tests/unit/public-status/redis-contract.test.ts
新增常量到测试模块并验证 v2 key 默认值、显式 v1 前缀生成、rollup key 与 coverage-start key 格式与默认 bucket 分钟数。
Rebuild worker rollup and hint tests
tests/unit/public-status/rebuild-worker.test.ts
更新测试以使用 hgetall mock 驱动 rollup 行为;新增覆盖全窗口标记与 hint TTL 预检查用例;更新 manifest 覆盖字段断言并改用 key builder 断言。
Read-store legacy fallback and rollup coverage tests
tests/unit/public-status/read-store.test.ts
新增测试验证 v2 rollout 回退到 legacy v1 投影的 stale/trigger 序列;验证 rollup 覆盖不完整时继续返回 legacy;以及 v2 覆盖不完整且 legacy 缺失时的行为。
Public status view polling and state reuse tests
tests/unit/public-status/public-status-view.test.tsx
更新轮询断言以包含 include=meta,defaults,groups;添加 filterSlug 变体;新增测试验证轮询期间 timeline 复用时摘要状态仍能更新。
DB import guard for rollup modules
tests/unit/public-status/no-db-import-guard.test.ts
将 rollup-store 与 aggregation-core 列入 guardedFiles 与 directTokenGuardFiles,确保无不当 DB 导入。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR 描述详细阐述了设计目标、实现方案、性能优化与兼容策略,与代码变更高度关联。
Title check ✅ Passed 标题「优化公开状态页 Redis 聚合与轮询性能」准确概括了本 PR 的核心变更——将公开状态页数据聚合架构从数据库回扫重设计为基于 Redis rollup + snapshot,并优化了轮询性能。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added enhancement New feature or request area:core area:statistics size/XL Extra Large PR (> 1000 lines) labels May 19, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This XL PR redesigns the public status page data pipeline from DB-scanned aggregation to Redis-based incremental rollup (v2 namespace), with a backward-compatible v1 fallback during upgrade. The frontend reduces polling payload size by excluding timeline data and reusing local state. The implementation is thorough: error handling is consistent (fire-and-forget writes use catch+log, reads use safe fallbacks), the v1/v2 migration path is well-structured with stale markers and rebuild hints, and new functionality has dedicated test coverage.

PR Size: XL

  • Lines changed: 2985 (2640 additions + 345 deletions)
  • Files changed: 26
  • Split suggestion: This PR could be split into: (1) model-vendor-rules.ts extraction + aggregation-core.ts refactor, (2) rollup-store.ts + redis-contract.ts v2 keys + rebuild-worker.ts rollup integration, (3) message.ts rollup hooks + read-store.ts v1/v2 fallback, (4) frontend polling + timeline changes. However, the v2 rollout depends on all backend pieces working together, so the monolithic approach is understandable.

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Observations (below reporting threshold)

  • public-status-view.tsx uses as ViewModelSnapshot casts to access the injected timelineReusedFromPrevious field. This bypasses type checking but is a pragmatic tradeoff to avoid leaking a view-layer concern into the shared PublicStatusPayload type. (Confidence: 72)
  • Module-level mutable caches (publicStatusRequestSeedCache in message.ts, cachedConfiguredGroups in rollup-store.ts) are bounded by size/TTL respectively and are acceptable in the server-side Next.js context. (Confidence: 65)
  • Non-pipeline Redis fallback in writePublicStatusRollupEvent could produce partial writes on individual hincrbyfloat failures, but errors propagate to queuePublicStatusRollupWrite which catches and logs them, and ioredis always supports pipelines in production. (Confidence: 62)

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean (no SQL injection vectors introduced, Redis keys use proper encoding, no user input reaches eval/lua scripts)
  • Error handling - Clean (all new async paths have catch+log, fire-and-forget writes use void IIFE with try/catch)
  • Type safety - Clean (no any usage in new code, discriminated unions used for read outcomes)
  • Documentation accuracy - Clean (v1 comment references updated to version-agnostic)
  • Test coverage - Adequate (new rollup-store 324-line test file, new message-rollup 199-line test file, updated tests for all modified modules)
  • Code clarity - Good (helper extraction in read-store reduces duplication, single active timeline summary reduces DOM overhead)

Automated review by Claude AI

Comment thread src/repository/message.ts
Comment on lines +38 to +50
const publicStatusRequestSeedCache = new Map<number, PublicStatusRequestSeed>();
const PUBLIC_STATUS_REQUEST_SEED_CACHE_MAX_SIZE = 10_000;

function rememberPublicStatusRequestSeed(id: number, seed: PublicStatusRequestSeed): void {
publicStatusRequestSeedCache.set(id, seed);
if (publicStatusRequestSeedCache.size <= PUBLIC_STATUS_REQUEST_SEED_CACHE_MAX_SIZE) {
return;
}

const firstKey = publicStatusRequestSeedCache.keys().next().value as number | undefined;
if (firstKey !== undefined) {
publicStatusRequestSeedCache.delete(firstKey);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Process-local seed cache silently drops rollup in multi-pod deployments

publicStatusRequestSeedCache is a module-level Map that lives only in the running process. In a horizontally-scaled setup (multiple pods/workers), createMessageRequest and updateMessageRequestDetails can hit different pods; when they do, consumePublicStatusRequestSeed on the second pod finds nothing and the rollup write is silently skipped — no error, no counter, just lost data. The codebase already accepts this tradeoff for enqueueMessageRequestUpdate, but it's worth a comment making the single-process assumption explicit so future scaling work knows to revisit this path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/message.ts
Line: 38-50

Comment:
**Process-local seed cache silently drops rollup in multi-pod deployments**

`publicStatusRequestSeedCache` is a module-level `Map` that lives only in the running process. In a horizontally-scaled setup (multiple pods/workers), `createMessageRequest` and `updateMessageRequestDetails` can hit different pods; when they do, `consumePublicStatusRequestSeed` on the second pod finds nothing and the rollup write is silently skipped — no error, no counter, just lost data. The codebase already accepts this tradeoff for `enqueueMessageRequestUpdate`, but it's worth a comment making the single-process assumption explicit so future scaling work knows to revisit this path.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +358 to +388
const coverageStartKey = buildPublicStatusRollupCoverageStartKey();
const bucketStartIso = alignBucketStartUtc(createdAtIso, PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES);
const redis = input.redis ?? getRedisClient({ allowWhenRateLimitDisabled: true });
if (
!redis ||
("status" in redis && redis.status && redis.status !== "ready") ||
typeof redis.hincrbyfloat !== "function"
) {
return { written: false, incrementCount: increments.length, key };
}

if (typeof redis.pipeline === "function") {
const pipeline = redis.pipeline();
for (const increment of increments) {
pipeline.hincrbyfloat(key, buildPublicStatusRollupField(increment), increment.value);
}
if (typeof pipeline.set === "function") {
pipeline.set(coverageStartKey, bucketStartIso, "NX");
}
pipeline.expire(key, ROLLUP_TTL_SECONDS);
await pipeline.exec();
} else {
for (const increment of increments) {
const field = buildPublicStatusRollupField(increment);
await redis.hincrbyfloat(key, field, increment.value);
}
if (typeof redis.set === "function") {
await redis.set(coverageStartKey, bucketStartIso, "NX");
}
await redis.expire?.(key, ROLLUP_TTL_SECONDS);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Coverage-start key has no TTL — stale coverage claim after bucket expiry

coverageStartKey is written with NX and no expiry, so it persists indefinitely. Rollup bucket hashes use ROLLUP_TTL_SECONDS = 60 * 60 * 24 * 32 (32 days). If the service is offline for longer than 32 days (or Redis is rebuilt from a backup that lacks recent data), all rollup buckets expire while the coverage-start key remains. The next rebuild calls isRollupCoverageComplete with a coverageStartedAt from 32+ days ago and a coveredFrom only ~24 h ago — the check passes, rollupCoverageComplete is stored as true, and the read side serves an empty v2 snapshot instead of falling back to v1. Setting the same ROLLUP_TTL_SECONDS on this key would keep it in sync with the data it covers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/rollup-store.ts
Line: 358-388

Comment:
**Coverage-start key has no TTL — stale coverage claim after bucket expiry**

`coverageStartKey` is written with `NX` and no expiry, so it persists indefinitely. Rollup bucket hashes use `ROLLUP_TTL_SECONDS = 60 * 60 * 24 * 32` (32 days). If the service is offline for longer than 32 days (or Redis is rebuilt from a backup that lacks recent data), all rollup buckets expire while the coverage-start key remains. The next rebuild calls `isRollupCoverageComplete` with a `coverageStartedAt` from 32+ days ago and a `coveredFrom` only ~24 h ago — the check passes, `rollupCoverageComplete` is stored as `true`, and the read side serves an empty v2 snapshot instead of falling back to v1. Setting the same `ROLLUP_TTL_SECONDS` on this key would keep it in sync with the data it covers.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +76 to +79
return {
...rule,
icon: MODEL_VENDOR_ICON_BY_KEY[rule.i18nKey] ?? OpenAI,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent OpenAI fallback for unrecognised i18nKey could show wrong branding

The original array kept icon and rule together, making divergence impossible. Now that they live in separate files, any vendor added to model-vendor-rules.ts without a matching entry in MODEL_VENDOR_ICON_BY_KEY silently renders the OpenAI logo. A dev-mode console.warn would surface the mismatch early.

Suggested change
return {
...rule,
icon: MODEL_VENDOR_ICON_BY_KEY[rule.i18nKey] ?? OpenAI,
};
const icon = MODEL_VENDOR_ICON_BY_KEY[rule.i18nKey];
if (!icon && process.env.NODE_ENV !== "production") {
console.warn(`[model-vendor-icons] No icon registered for i18nKey "${rule.i18nKey}"`);
}
return {
...rule,
icon: icon ?? OpenAI,
};
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/model-vendor-icons.tsx
Line: 76-79

Comment:
**Silent OpenAI fallback for unrecognised `i18nKey` could show wrong branding**

The original array kept icon and rule together, making divergence impossible. Now that they live in separate files, any vendor added to `model-vendor-rules.ts` without a matching entry in `MODEL_VENDOR_ICON_BY_KEY` silently renders the OpenAI logo. A dev-mode `console.warn` would surface the mismatch early.

```suggestion
  const icon = MODEL_VENDOR_ICON_BY_KEY[rule.i18nKey];
  if (!icon && process.env.NODE_ENV !== "production") {
    console.warn(`[model-vendor-icons] No icon registered for i18nKey "${rule.i18nKey}"`);
  }
  return {
    ...rule,
    icon: icon ?? OpenAI,
  };
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +147 to +159
export function buildPublicStatusRollupKey(input: {
bucketStartIso: string;
bucketMinutes?: number;
prefix?: string;
}): string {
const bucketMinutes = input.bucketMinutes ?? PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES;
assertPositiveInteger(bucketMinutes, "bucketMinutes");
return [
resolvePrefix(input.prefix),
"rollup",
`${bucketMinutes}m`,
encodeKeyPart(alignBucketStartUtc(input.bucketStartIso, bucketMinutes)),
].join(":");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 alignBucketStartUtc call site lacks documentation of alignment semantics

alignBucketStartUtc is called inside buildPublicStatusRollupKey without a visible definition nearby. An inline comment clarifying that it floors to the nearest bucketMinutes UTC boundary would help readers understand the key structure.

Suggested change
export function buildPublicStatusRollupKey(input: {
bucketStartIso: string;
bucketMinutes?: number;
prefix?: string;
}): string {
const bucketMinutes = input.bucketMinutes ?? PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES;
assertPositiveInteger(bucketMinutes, "bucketMinutes");
return [
resolvePrefix(input.prefix),
"rollup",
`${bucketMinutes}m`,
encodeKeyPart(alignBucketStartUtc(input.bucketStartIso, bucketMinutes)),
].join(":");
export function buildPublicStatusRollupKey(input: {
bucketStartIso: string;
bucketMinutes?: number;
prefix?: string;
}): string {
const bucketMinutes = input.bucketMinutes ?? PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES;
assertPositiveInteger(bucketMinutes, "bucketMinutes");
// alignBucketStartUtc floors the ISO timestamp to the nearest bucketMinutes boundary in UTC.
return [
resolvePrefix(input.prefix),
"rollup",
`${bucketMinutes}m`,
encodeKeyPart(alignBucketStartUtc(input.bucketStartIso, bucketMinutes)),
].join(":");
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/redis-contract.ts
Line: 147-159

Comment:
**`alignBucketStartUtc` call site lacks documentation of alignment semantics**

`alignBucketStartUtc` is called inside `buildPublicStatusRollupKey` without a visible definition nearby. An inline comment clarifying that it floors to the nearest `bucketMinutes` UTC boundary would help readers understand the key structure.

```suggestion
export function buildPublicStatusRollupKey(input: {
  bucketStartIso: string;
  bucketMinutes?: number;
  prefix?: string;
}): string {
  const bucketMinutes = input.bucketMinutes ?? PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES;
  assertPositiveInteger(bucketMinutes, "bucketMinutes");
  // alignBucketStartUtc floors the ISO timestamp to the nearest bucketMinutes boundary in UTC.
  return [
    resolvePrefix(input.prefix),
    "rollup",
    `${bucketMinutes}m`,
    encodeKeyPart(alignBucketStartUtc(input.bucketStartIso, bucketMinutes)),
  ].join(":");
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
src/lib/public-status/rollup-store.ts (1)

9-20: ⚡ Quick win

src/ 内部导入统一改成 @/ 别名。

这个文件里同时混用了 @/./...。既然这些模块都在 src/ 下,继续保留相对路径会让后续移动文件时更脆弱,也不符合仓库约定。 As per coding guidelines, **/*.{ts,tsx,js,jsx}: Use path alias @/ to map to ./src/ for imports.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/public-status/rollup-store.ts` around lines 9 - 20, The imports in
rollup-store.ts mix relative paths with the project alias; replace all local-src
imports that use "./..." with the "`@/`..." alias so they consistently map to
./src/ — e.g., change imports that bring in computeTokensPerSecond,
PublicStatusConfiguredGroup, InternalPublicStatusConfigSnapshot,
readCurrentInternalPublicStatusConfigSnapshot, PublicStatusPayload,
PublicStatusTimelineBucket, alignBucketStartUtc,
buildPublicStatusRollupCoverageStartKey, buildPublicStatusRollupKey, and
PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES to use the "`@/`..." path prefix; ensure
export names and types are unchanged and tests/build still resolve the updated
module specifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/`[locale]/status/_components/public-status-view.tsx:
- Around line 158-163: deriveCurrentModelState currently falls back to
deriveLatestModelState even when model.timeline is an empty array, which
downgrades newly-polled models to no_data and drops availabilityPct; change
deriveCurrentModelState (and other spots that call deriveLatestModelState for
initial/partial polls) to return model.latestState whenever model.timeline is
empty or model.timelineReusedFromPrevious is true (i.e., if
model.timeline?.length === 0 || model.timelineReusedFromPrevious) so the
server-provided latestState and availabilityPct are preserved instead of
recalculating from a missing timeline.

In `@src/lib/model-vendor-icons.tsx`:
- Line 36: Replace the relative import in model-vendor-icons.tsx with the
project alias: update the import that currently brings in getModelVendor as
getModelVendorRule and type ModelVendorRule from "./model-vendor-rules" to use
"`@/lib/model-vendor-rules`" instead, preserving the same named imports and alias
(getModelVendor as getModelVendorRule) and the type import (ModelVendorRule).

In `@src/lib/model-vendor-rules.ts`:
- Around line 10-113: The rules array should be a readonly literal so `prefix`
and `i18nKey` keep their literal types and the array cannot be mutated; remove
the explicit `ModelVendorRule[]` annotation on MODEL_VENDOR_RULES, declare it as
a const readonly literal (using `const ... = [...] as const`) so prefixes remain
string literal types and the array is immutable, and if you need the existing
ModelVendorRule type derive/align it from the literal (e.g. `type
ModelVendorRule = typeof MODEL_VENDOR_RULES[number]`) so downstream exhaustive
checks over `prefix`/`i18nKey` work and callers cannot reorder or mutate the
list.

In `@src/lib/public-status/rollup-store.ts`:
- Around line 520-538: buildBucketStarts silently approximates unsupported
intervalMinutes (e.g., 16) by rounding to bucket factors, causing misaligned
coveredTo/bucket boundaries; add a fail-fast validation at the top of
buildBucketStarts that rejects non-supported intervalMinutes (accept only
multiples of PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES or the explicit whitelist
5,15,30,60) and throw a clear error (include interval value) instead of
rounding; also apply the same strict validation to the related rollup boundary
logic that calls alignBucketStartUtc/uses intervalMinutes so all callers enforce
the same allowed intervals.
- Around line 298-337: The code is incorrectly attributing request-level
TTFB/TPS to every non-excluded group; only the group that actually succeeded
should receive ttfb_sum/ttfb_count and tps_sum/tps_count. Change the logic in
the loop over groupOutcome (where ttfbMs, tps, increments, outcome, groupId,
modelKey are used) so that the pushes of ttfb_* and tps_* happen only when
outcome === "success" (i.e., guard the ttfbMs !== null and tps !== null branches
with outcome === "success"); also add a unit/integration test covering a
failure-then-fallback chain ("A" fails, "B" succeeds) to ensure TTFB/TPS are
only recorded for the successful group.
- Around line 369-378: The code currently calls pipeline.exec() in
rollup-store.ts and ignores its per-command results, so any individual
HINCRBYFLOAT/EXPIRE failures are silently treated as success; change the logic
after calling pipeline.exec() (the result of redis.pipeline().exec()) to iterate
the returned array of [error, result] tuples, and if any tuple has a non-null
error convert that into an explicit failure (reject/throw or return written:
false and log via the same logger used in queuePublicStatusRollupWrite),
including context like key and which buildPublicStatusRollupField(increment)
failed; ensure you still preserve setting coverageStartKey and TTL behavior
(ROLLUP_TTL_SECONDS) but fail the outer operation when any pipeline entry
errored.

In `@src/repository/message.ts`:
- Around line 71-105: Currently queuePublicStatusRollupForFinalDetails consumes
the seed immediately via consumePublicStatusRequestSeed(id) which can
permanently lose the seed if later steps fail; change the flow to first
read/peek the seed (use a non-destructive read or a new
getPublicStatusRequestSeed/peekPublicStatusRequestSeed helper), then mark the
seed as in-flight (introduce or call an "markPublicStatusRequestInFlight"
operation tied to the id), perform getConfiguredPublicStatusGroupsForRollup()
and await queuePublicStatusRollupWrite(...), and only after
queuePublicStatusRollupWrite succeeds call consumePublicStatusRequestSeed(id) to
remove the seed; ensure on error you clear the in-flight mark (or leave it for
retry policies) and keep logging in the catch block (reference functions:
queuePublicStatusRollupForFinalDetails, consumePublicStatusRequestSeed,
queuePublicStatusRollupWrite).

---

Nitpick comments:
In `@src/lib/public-status/rollup-store.ts`:
- Around line 9-20: The imports in rollup-store.ts mix relative paths with the
project alias; replace all local-src imports that use "./..." with the "`@/`..."
alias so they consistently map to ./src/ — e.g., change imports that bring in
computeTokensPerSecond, PublicStatusConfiguredGroup,
InternalPublicStatusConfigSnapshot,
readCurrentInternalPublicStatusConfigSnapshot, PublicStatusPayload,
PublicStatusTimelineBucket, alignBucketStartUtc,
buildPublicStatusRollupCoverageStartKey, buildPublicStatusRollupKey, and
PUBLIC_STATUS_ROLLUP_BUCKET_MINUTES to use the "`@/`..." path prefix; ensure
export names and types are unchanged and tests/build still resolve the updated
module specifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0738d741-65b2-44fe-89cd-04c4d0515b75

📥 Commits

Reviewing files that changed from the base of the PR and between e0d0978 and 57104e1.

📒 Files selected for processing (26)
  • src/app/[locale]/status/[slug]/page.tsx
  • src/app/[locale]/status/_components/public-status-timeline.tsx
  • src/app/[locale]/status/_components/public-status-view.tsx
  • src/lib/model-vendor-icons.tsx
  • src/lib/model-vendor-rules.ts
  • src/lib/public-status/aggregation-core.ts
  • src/lib/public-status/aggregation.ts
  • src/lib/public-status/config-publisher.ts
  • src/lib/public-status/config-snapshot.ts
  • src/lib/public-status/read-store.ts
  • src/lib/public-status/rebuild-hints.ts
  • src/lib/public-status/rebuild-worker.ts
  • src/lib/public-status/redis-contract.ts
  • src/lib/public-status/rollup-store.ts
  • src/lib/public-status/scheduler.ts
  • src/lib/public-status/vendor-icon-key.ts
  • src/repository/message.ts
  • tests/unit/public-status/config-publisher.test.ts
  • tests/unit/public-status/config-snapshot.test.ts
  • tests/unit/public-status/no-db-import-guard.test.ts
  • tests/unit/public-status/public-status-view.test.tsx
  • tests/unit/public-status/read-store.test.ts
  • tests/unit/public-status/rebuild-worker.test.ts
  • tests/unit/public-status/redis-contract.test.ts
  • tests/unit/public-status/rollup-store.test.ts
  • tests/unit/repository/message-public-status-rollup.test.ts

Comment thread src/app/[locale]/status/_components/public-status-view.tsx
Comment thread src/lib/model-vendor-icons.tsx Outdated
Comment thread src/lib/model-vendor-rules.ts Outdated
Comment thread src/lib/public-status/rollup-store.ts
Comment thread src/lib/public-status/rollup-store.ts Outdated
Comment thread src/lib/public-status/rollup-store.ts
Comment thread src/repository/message.ts
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major architectural shift for the public status system, moving from on-demand database queries to a Redis-based rollup mechanism (v2). It includes a robust migration path that serves legacy v1 projections until the new rollup window is fully covered. Key changes include the introduction of a rollup-store for high-frequency metric increments, UI refactoring of the status timeline to improve accessibility and performance, and logic to reuse timeline data during polling. Feedback focuses on the reliability of the in-memory metadata cache in distributed environments, performance optimizations for rollup aggregation loops, and ensuring the reliability of migration markers in Redis.

Comment thread src/repository/message.ts
model?: string;
};

const publicStatusRequestSeedCache = new Map<number, PublicStatusRequestSeed>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using an in-memory Map (publicStatusRequestSeedCache) to store request metadata between createMessageRequest and updateMessageRequestDetails is unreliable in distributed or serverless environments. If the request lifecycle spans multiple instances or if the process restarts, the 'seed' data will be lost, resulting in missing rollup metrics. Consider storing this temporary state in Redis with a short TTL or fetching the necessary fields (createdAt, originalModel) from the database during the update call if they are not already available in the context.

Comment on lines +395 to +403
const rollupSampleCount = rollupBuckets.reduce((sum, bucket) => {
for (const [field, value] of bucket.values) {
const parsedField = parsePublicStatusRollupField(field);
if (parsedField?.metric === "success" || parsedField?.metric === "failure") {
sum += value;
}
}
return sum;
}, 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for calculating rollupSampleCount parses every field in every bucket using parsePublicStatusRollupField, which involves string splitting and URI decoding. Given that this loop can run hundreds of thousands of times (buckets × groups × models × metrics), this is quite expensive. Since we only need to identify 'success' and 'failure' metrics, a simple string suffix check would be significantly more efficient.

        const rollupSampleCount = rollupBuckets.reduce((sum, bucket) => {
          for (const [field, value] of bucket.values) {
            if (field.endsWith("|success") || field.endsWith("|failure")) {
              sum += value;
            }
          }
          return sum;
        }, 0);

pipeline.hincrbyfloat(key, buildPublicStatusRollupField(increment), increment.value);
}
if (typeof pipeline.set === "function") {
pipeline.set(coverageStartKey, bucketStartIso, "NX");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The coverageStartKey is critical for the migration logic that determines when to switch from legacy v1 projections to v2 rollups. In the pipeline block, the set command is guarded by a type check (typeof pipeline.set === "function"). If this check fails, the coverage start timestamp will never be recorded, potentially causing the status page to stay on legacy data indefinitely. While most Redis clients support set, it's safer to ensure this operation is performed or logged if it fails.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d75f47279b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/repository/message.ts Outdated
Comment on lines +122 to +123
if (!isPublicStatusFinalDetails(details) || !claimPublicStatusFinalization(id)) {
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delay finalization claim until rollup write is actually possible

The request is marked finalized before the code verifies that rollup prerequisites exist, so transient config unavailability can permanently drop data. In queuePublicStatusRollupForFinalDetails, claimPublicStatusFinalization(id) runs first, but if getConfiguredPublicStatusGroupsForRollup() returns an empty list (for example during startup/bootstrap or a temporary Redis/config miss), the function returns early and later calls for the same request are blocked by the finalized cache. That means success/failure metrics for those requests are never written to rollups, creating silent gaps in public status accuracy.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/public-status/public-status-view.test.tsx (1)

610-677: ⚡ Quick win

确保 fake timers 清理总能执行,避免后续用例被污染。

当前 vi.useRealTimers() / unmount() 放在断言后;若中途断言失败,清理不会执行,可能导致后续测试串扰。建议用 try/finally 包裹。

建议修改
-    await act(async () => {
-      vi.advanceTimersByTime(30_000);
-      await Promise.resolve();
-    });
-
-    const text = container.textContent || "";
-    expect(text).toContain("GPT-4.2");
-    expect(text).toContain("Failed");
-    expect(text).toContain("0.00%");
-
-    vi.useRealTimers();
-    unmount();
+    try {
+      await act(async () => {
+        vi.advanceTimersByTime(30_000);
+        await Promise.resolve();
+      });
+
+      const text = container.textContent || "";
+      expect(text).toContain("GPT-4.2");
+      expect(text).toContain("Failed");
+      expect(text).toContain("0.00%");
+    } finally {
+      vi.useRealTimers();
+      unmount();
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/public-status/public-status-view.test.tsx` around lines 610 - 677,
The test "uses server summary metrics when polling returns a new model without
timeline" currently calls vi.useRealTimers() and unmount() after assertions,
which can leak fake timers or mounted components if an assertion throws; wrap
the polling/advance and assertions in a try/finally so that vi.useRealTimers()
and unmount() are invoked in the finally block (ensure unmount() and
vi.useRealTimers() are still called even on failure), locating the cleanup
around the render(...) result (container, unmount) and the
vi.useFakeTimers()/vi.advanceTimersByTime(...) section.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/public-status/public-status-view.test.tsx`:
- Around line 610-677: The test "uses server summary metrics when polling
returns a new model without timeline" currently calls vi.useRealTimers() and
unmount() after assertions, which can leak fake timers or mounted components if
an assertion throws; wrap the polling/advance and assertions in a try/finally so
that vi.useRealTimers() and unmount() are invoked in the finally block (ensure
unmount() and vi.useRealTimers() are still called even on failure), locating the
cleanup around the render(...) result (container, unmount) and the
vi.useFakeTimers()/vi.advanceTimersByTime(...) section.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a8c9e5ec-75fc-40f2-b2fa-7f390f2cf099

📥 Commits

Reviewing files that changed from the base of the PR and between d75f472 and 2b9e35c.

📒 Files selected for processing (12)
  • src/app/[locale]/status/_components/public-status-view.tsx
  • src/lib/model-vendor-icons.test.ts
  • src/lib/model-vendor-icons.tsx
  • src/lib/model-vendor-rules.ts
  • src/lib/public-status/aggregation.ts
  • src/lib/public-status/rebuild-worker.ts
  • src/lib/public-status/rollup-store.ts
  • src/repository/message.ts
  • tests/unit/public-status/aggregation.test.ts
  • tests/unit/public-status/public-status-view.test.tsx
  • tests/unit/public-status/rollup-store.test.ts
  • tests/unit/repository/message-public-status-rollup.test.ts

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b9e35c6fc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +516 to +519
const results = await pipeline.exec();
for (let batchIndex = 0; batchIndex < batchStarts.length; batchIndex++) {
const [error, raw] = results?.[batchIndex] ?? [null, null];
buckets.push(parseBucket(batchStarts[batchIndex]!, error ? null : raw));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail rebuild when rollup pipeline returns null results

Treating pipeline.exec() returning null as an all-empty batch silently publishes a fresh snapshot with zeroed rollup data, which can temporarily wipe status visibility to no_data even when real rollup data exists. In this code path, each bucket is parsed from results?.[batchIndex] ?? [null, null], so a null exec is downgraded into empty buckets instead of a rebuild failure/retry signal.

Useful? React with 👍 / 👎.

Comment on lines +347 to +349
await input.triggerRebuildHint("rollup-coverage-incomplete");
await input.triggerRebuildHint("legacy-generation");
if (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Coalesce rebuild hints on rollup-coverage fallback path

This branch awaits multiple triggerRebuildHint calls sequentially for a single read, which turns one status-page poll into multiple blocking hint round-trips during the rollout window (rollupCoverageComplete === false). Because hint scheduling now short-circuits via TTL checks, these extra awaited calls mostly add latency on the hot read path without improving correctness; they should be coalesced into one hint or made non-blocking.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/unit/public-status/public-status-view.test.tsx (2)

407-513: ⚡ Quick win

建议添加 try/finally 以确保清理代码执行。

此测试使用了 fake timers,但清理逻辑(vi.useRealTimers()unmount())未包裹在 finally 块中。为保证测试可靠性并与本 PR 中其他测试的模式保持一致,建议采用 try/finally 包裹。

♻️ 建议的重构方案
-    await act(async () => {
-      await Promise.resolve();
-    });
-
-    expect(fetchMock).toHaveBeenCalledWith(
-      "/api/public-status?groupSlug=platform&include=meta%2Cdefaults%2Cgroups",
-      { cache: "no-store" }
-    );
-
-    await act(async () => {
-      vi.advanceTimersByTime(30_000);
-      await Promise.resolve();
-    });
-
-    const text = container.textContent || "";
-    expect(text).toContain("Platform Model");
-    expect(text).not.toContain("OpenAI Model");
-    expect(container.querySelectorAll('[data-testid="sortable-group-panel"]')).toHaveLength(1);
-
-    vi.useRealTimers();
-    unmount();
+    try {
+      await act(async () => {
+        await Promise.resolve();
+      });
+
+      expect(fetchMock).toHaveBeenCalledWith(
+        "/api/public-status?groupSlug=platform&include=meta%2Cdefaults%2Cgroups",
+        { cache: "no-store" }
+      );
+
+      await act(async () => {
+        vi.advanceTimersByTime(30_000);
+        await Promise.resolve();
+      });
+
+      const text = container.textContent || "";
+      expect(text).toContain("Platform Model");
+      expect(text).not.toContain("OpenAI Model");
+      expect(container.querySelectorAll('[data-testid="sortable-group-panel"]')).toHaveLength(1);
+    } finally {
+      vi.useRealTimers();
+      unmount();
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/public-status/public-status-view.test.tsx` around lines 407 - 513,
The test "keeps filterSlug-scoped default group after polling refresh" uses
vi.useFakeTimers() but leaves cleanup (vi.useRealTimers() and unmount()) outside
a finally block; wrap the main test body (everything after vi.useFakeTimers())
in a try block and move vi.useRealTimers() and unmount() into a finally so the
fake timers are always restored and the component always unmounted (references:
vi.useFakeTimers, vi.useRealTimers, unmount, and the fetchMock/container usage
inside the test).

277-325: ⚡ Quick win

建议添加 try/finally 以确保清理代码执行。

此测试使用了 fake timers,但未将清理逻辑(vi.useRealTimers()unmount())包裹在 finally 块中。如果断言失败,清理代码将不会执行,可能导致测试污染。建议采用与 lines 595-609 和 667-680 相同的模式。

♻️ 建议的重构方案
-    await act(async () => {
-      await Promise.resolve();
-    });
-
-    expect(fetchMock).toHaveBeenCalledWith(
-      "/api/public-status?interval=5&rangeHours=24&include=meta%2Cdefaults%2Cgroups",
-      { cache: "no-store" }
-    );
-
-    await act(async () => {
-      vi.advanceTimersByTime(30_000);
-      await Promise.resolve();
-    });
-
-    expect(container.textContent).toContain("Refresh delayed");
-    expect(container.textContent).toContain("Preparing first snapshot");
-
-    vi.useRealTimers();
-    unmount();
+    try {
+      await act(async () => {
+        await Promise.resolve();
+      });
+
+      expect(fetchMock).toHaveBeenCalledWith(
+        "/api/public-status?interval=5&rangeHours=24&include=meta%2Cdefaults%2Cgroups",
+        { cache: "no-store" }
+      );
+
+      await act(async () => {
+        vi.advanceTimersByTime(30_000);
+        await Promise.resolve();
+      });
+
+      expect(container.textContent).toContain("Refresh delayed");
+      expect(container.textContent).toContain("Preparing first snapshot");
+    } finally {
+      vi.useRealTimers();
+      unmount();
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/public-status/public-status-view.test.tsx` around lines 277 - 325,
Wrap the fake-timer test body in a try/finally to guarantee cleanup: start by
calling vi.useFakeTimers(), render PublicStatusView and run assertions inside
the try block, and in the finally block always call vi.useRealTimers() and
unmount() (referencing the existing vi.useFakeTimers(), vi.useRealTimers(),
unmount() and fetchMock usage) so timers and DOM are restored even if an
assertion fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/public-status/public-status-view.test.tsx`:
- Around line 407-513: The test "keeps filterSlug-scoped default group after
polling refresh" uses vi.useFakeTimers() but leaves cleanup (vi.useRealTimers()
and unmount()) outside a finally block; wrap the main test body (everything
after vi.useFakeTimers()) in a try block and move vi.useRealTimers() and
unmount() into a finally so the fake timers are always restored and the
component always unmounted (references: vi.useFakeTimers, vi.useRealTimers,
unmount, and the fetchMock/container usage inside the test).
- Around line 277-325: Wrap the fake-timer test body in a try/finally to
guarantee cleanup: start by calling vi.useFakeTimers(), render PublicStatusView
and run assertions inside the try block, and in the finally block always call
vi.useRealTimers() and unmount() (referencing the existing vi.useFakeTimers(),
vi.useRealTimers(), unmount() and fetchMock usage) so timers and DOM are
restored even if an assertion fails.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6c2fc430-8d77-49e5-92dc-e166084bafbd

📥 Commits

Reviewing files that changed from the base of the PR and between 2b9e35c and 4da03aa.

📒 Files selected for processing (1)
  • tests/unit/public-status/public-status-view.test.tsx

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4da03aa142

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +518 to +519
const [error, raw] = results?.[batchIndex] ?? [null, null];
buckets.push(parseBucket(batchStarts[batchIndex]!, error ? null : raw));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail rollup rebuild on Redis pipeline command errors

When batched HGETALL calls return per-command errors, this code silently treats those buckets as empty (error ? null : raw) and continues rebuilding, which can publish a new snapshot with partial/zeroed data instead of preserving the last good projection. In production, transient Redis command failures during a rebuild will therefore degrade public status accuracy without surfacing a rebuild failure signal.

Useful? React with 👍 / 👎.

Comment on lines +220 to +222
if (!previousGroup) {
return group;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Load timeline when transitioning from no snapshot state

Polling always requests include=meta,defaults,groups (no timeline), and when a group appears for the first time this branch returns it as-is, leaving timeline empty. If a user opens the page while the system is rebuilding (initial payload has no groups), then later receives the first ready payload via polling, charts never get real history data and remain empty until a full page reload.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: add2a1cb5c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +361 to +362
if (groupsByRollupId.get(groupId) !== group) {
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle rollup ID collisions instead of silently dropping increments

When two configured groups resolve to the same rollup key string (for example one group has sourceGroupId = 5 while another falls back to sourceGroupName = "5"), groupsByRollupId becomes last-write-wins and this guard drops one group’s samples entirely. In that collision case, increments for the overwritten group are skipped with no error/log, so success/failure and latency rollups are silently undercounted for affected public groups.

Useful? React with 👍 / 👎.

@tesgth032 tesgth032 changed the title 优化公开状态页 Redis 聚合与轮询性能 [未完成] 优化公开状态页 Redis 聚合与轮询性能 May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:statistics enhancement New feature or request size/XL Extra Large PR (> 1000 lines)

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant