fix(public-status): persist hourly rollups#1156
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthrough该 PR 引入按小时的公开状态汇总:新增 DB 表与索引、实现按小时汇总的计算/持久化与 Redis 当前小时缓存、将读取路径与重建流程调整为使用汇总数据,并更新 API 路由以支持并优先使用内部配置快照。 ChangesPublic Status Hourly Rollups
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a database-backed hourly rollup system for public status reporting, replacing the previous purely Redis-based approach. It adds a new public_status_hourly_rollups table, logic to aggregate request data into hourly buckets, and updates the API route and rebuild worker to utilize these rollups for historical data. Review feedback highlights a significant performance regression in the rebuild worker due to sequential database queries, identifies a loss of granularity in status classification (missing 'degraded' state), and suggests improvements for the median calculation and Drizzle ORM usage.
| const finalizedHourStarts: Date[] = []; | ||
| for ( | ||
| let cursorMs = Date.parse(rollupCoveredFrom); | ||
| cursorMs < Date.parse(coveredTo); | ||
| cursorMs += 60 * 60 * 1000 | ||
| ) { | ||
| finalizedHourStarts.push(new Date(cursorMs)); | ||
| } | ||
| for (const hourStartBatch of chunkArray(finalizedHourStarts, ROLLUP_WRITE_BATCH_SIZE)) { | ||
| await Promise.all( | ||
| hourStartBatch.map((hourStart) => | ||
| buildAndPersistPublicStatusHourlyRollup({ | ||
| configVersion: configSnapshot.configVersion, | ||
| hourStart, | ||
| groups, | ||
| }) | ||
| ) | ||
| ); | ||
| } |
There was a problem hiding this comment.
The current implementation introduces a significant performance regression by performing 720 sequential batches of database queries (one for each hour in the 30-day retention window) every time a rebuild is triggered.
In the previous implementation, a single query was used to fetch the entire range. Performing 720 individual queries to the message_request table on every worker run (which can be triggered frequently via hints) will put excessive and unnecessary load on the database.
Consider optimizing this by:
- Querying the entire range in a single query (or larger daily chunks) and aggregating the results in memory.
- Implementing logic to only process hours that are missing from the public_status_hourly_rollups table or are very recent (e.g., the last 2 hours) to account for late-arriving data.
| label: model.label, | ||
| vendorIconKey: model.vendorIconKey, | ||
| requestTypeBadge: model.requestTypeBadge, | ||
| state: sampleCount === 0 ? "no_data" : bucket.successCount > 0 ? "operational" : "failed", |
There was a problem hiding this comment.
The state calculation logic is overly lenient and results in a regression where the degraded status is lost.
Currently, any hour with at least one success is marked as operational, even if the failure rate is extremely high (e.g., 1 success and 99 failures). Furthermore, the degraded state, which is explicitly handled in the read path (read-store.ts), is omitted here and in the database schema definition.
It is recommended to use a threshold-based approach for status classification (e.g., availabilityPct > 99.9 ? 'operational' : availabilityPct > 95 ? 'degraded' : 'failed') and update the schema to support the degraded value.
| function median(values: number[]): number | null { | ||
| if (values.length === 0) { | ||
| return null; | ||
| } | ||
|
|
||
| const sorted = [...values].sort((left, right) => left - right); | ||
| const middle = Math.floor(sorted.length / 2); | ||
| if (sorted.length % 2 === 0) { | ||
| const left = sorted[middle - 1]; | ||
| const right = sorted[middle]; | ||
| if (typeof left !== "number" || typeof right !== "number") { | ||
| return null; | ||
| } | ||
| return Number(((left + right) / 2).toFixed(4)); | ||
| } | ||
|
|
||
| return sorted[middle] ?? null; | ||
| } |
There was a problem hiding this comment.
The median function has inconsistent rounding logic and redundant type checks. It rounds the result to 4 decimal places when the input length is even, but returns the raw value when the length is odd. Additionally, since the input is typed as number[], the typeof checks for left and right are unnecessary.
function median(values: number[]): number | null {
if (values.length === 0) {
return null;
}
const sorted = [...values].sort((left, right) => left - right);
const middle = Math.floor(sorted.length / 2);
const result = sorted.length % 2 === 0
? (sorted[middle - 1] + sorted[middle]) / 2
: sorted[middle];
return result !== undefined ? Number(result.toFixed(4)) : null;
}| lt(publicStatusHourlyRollups.bucketStart, input.end), | ||
| ]; | ||
| if (input.configVersion) { | ||
| conditions.push(sql`${publicStatusHourlyRollups.configVersion} = ${input.configVersion}`); |
There was a problem hiding this comment.
Using the sql template literal for a simple equality check is less idiomatic in Drizzle than using the built-in eq operator. This improves type safety and readability.
| conditions.push(sql`${publicStatusHourlyRollups.configVersion} = ${input.configVersion}`); | |
| conditions.push(eq(publicStatusHourlyRollups.configVersion, input.configVersion)); |
| vendorIconKey: model.vendorIconKey, | ||
| requestTypeBadge: model.requestTypeBadge, | ||
| state: sampleCount === 0 ? "no_data" : bucket.successCount > 0 ? "operational" : "failed", | ||
| successCount: bucket.successCount, |
There was a problem hiding this comment.
[Medium] [LOGIC-BUG] Timezone/DST bug in retention cutoff calculation
Why this is a problem: The cutoff calculation uses millisecond arithmetic (retentionDays * 24 * 60 * 60 * 1000) which assumes all days have exactly 86400 seconds. This is incorrect during DST transitions, potentially causing data to be retained longer than 30 days or deleted prematurely.
Suggested fix:
// Use date truncation for calendar-aware cutoff
const cutoff = new Date(now.getTime());
cutoff.setUTCDate(cutoff.getUTCDate() - retentionDays);
cutoff.setUTCHours(0, 0, 0, 0); // Align to day boundary for consistencyThere was a problem hiding this comment.
Code Review Summary
This PR implements a significant architectural improvement to reduce Redis memory growth by moving finalized public status history from Redis to persistent PostgreSQL hourly rollups. The implementation is well-structured and follows existing code patterns. One minor time calculation issue was identified.
PR Size: XL
- Lines changed: 6211 (6086 additions + 125 deletions)
- Files changed: 12
- Split suggestion: Consider splitting large snapshot JSON files into separate PRs in future migrations to reduce review load.
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 1 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Medium Priority Issues (Should Fix)
- [LOGIC-BUG] Timezone/DST bug in retention cutoff calculation (hourly-rollups.ts:271)
- The cutoff calculation uses millisecond arithmetic which doesn't account for DST transitions
- Use calendar-aware date arithmetic instead
Review Coverage
- Logic and correctness - Minor DST issue found
- Security (OWASP Top 10) - Clean, no injection risks or exposed secrets
- Error handling - Clean, errors properly logged and surfaced
- Type safety - Clean, proper TypeScript typing throughout
- Documentation accuracy - Clean, Chinese comments follow existing pattern
- Test coverage - Adequate unit and integration tests included
- Code clarity - Good, well-structured with clear function names
Notable Observations (Non-blocking)
- Test Coverage: New code has good test coverage with unit tests for hourly-rollups.ts, read-store.ts, and rebuild-worker.ts
- Migration: Properly follows CLAUDE.md database migration workflow (schema.ts modified, generated migration file included)
- Redis Memory Optimization: Successfully removes 30-day long-lived Redis keys, replacing with database persistence
- Backward Compatibility: Maintains existing /api/public-status response shape and Redis fallback
Automated review by Claude AI
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 60eb8bdaf6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const [historyRows, currentRows] = await Promise.all([ | ||
| readPublicStatusHourlyRollups({ | ||
| start: historyStart, | ||
| end: currentHourStart, | ||
| configVersion: input.configVersion, | ||
| }), |
There was a problem hiding this comment.
Catch rollup read failures before fallback
The DB-first path calls readPublicStatusHourlyRollups without a guard, so any transient Postgres error rejects readPublicStatusPayload and the route returns 500 instead of falling back to the existing Redis snapshot/manifest path. This is a regression from the previous Redis-only read behavior and breaks the intended rollout safety when DB connectivity is degraded.
Useful? React with 👍 / 👎.
| const windowStart = new Date( | ||
| currentHourStart.getTime() - (input.rangeHours - 1) * 60 * 60 * 1000 | ||
| ); | ||
| const bucketCount = input.rangeHours; |
There was a problem hiding this comment.
Preserve interval semantics in rollup payloads
This payload builder hardcodes one-hour buckets (bucketCount = rangeHours and + 60 * 60 * 1000 stepping), so requests like interval=5 or interval=15 now return hourly timelines and freshness behavior inconsistent with the requested interval. Because the DB rollup path is now preferred, clients using non-60-minute intervals will receive less granular and potentially stale-looking data compared to prior behavior.
Useful? React with 👍 / 👎.
| const finalizedHourStarts: Date[] = []; | ||
| for ( | ||
| let cursorMs = Date.parse(rollupCoveredFrom); | ||
| cursorMs < Date.parse(coveredTo); | ||
| cursorMs += 60 * 60 * 1000 | ||
| ) { | ||
| finalizedHourStarts.push(new Date(cursorMs)); | ||
| } | ||
| for (const hourStartBatch of chunkArray(finalizedHourStarts, ROLLUP_WRITE_BATCH_SIZE)) { | ||
| await Promise.all( | ||
| hourStartBatch.map((hourStart) => | ||
| buildAndPersistPublicStatusHourlyRollup({ | ||
| configVersion: configSnapshot.configVersion, | ||
| hourStart, | ||
| groups, | ||
| }) | ||
| ) | ||
| ); |
There was a problem hiding this comment.
Full 720-hour re-scan on every rebuild run
ROLLUP_HISTORY_HOURS = 720. On every rebuild invocation, the loop issues a SELECT from the requests table for each of the 720 hourly windows, then upserts into public_status_hourly_rollups. Historical hours (older than 1 hour) have finalized data and will never change, yet they are re-queried and re-upserted unconditionally. After the initial backfill the work is pure overhead: 720 reads + 720 upserts × each rebuild cycle. Restricting the full-scan to hours not yet present in the rollups table (or reducing the re-scan window to recent hours after the first run) would substantially reduce this ongoing cost.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/rebuild-worker.ts
Line: 308-325
Comment:
**Full 720-hour re-scan on every rebuild run**
`ROLLUP_HISTORY_HOURS = 720`. On every rebuild invocation, the loop issues a SELECT from the requests table for each of the 720 hourly windows, then upserts into `public_status_hourly_rollups`. Historical hours (older than 1 hour) have finalized data and will never change, yet they are re-queried and re-upserted unconditionally. After the initial backfill the work is pure overhead: 720 reads + 720 upserts × each rebuild cycle. Restricting the full-scan to hours not yet present in the rollups table (or reducing the re-scan window to recent hours after the first run) would substantially reduce this ongoing cost.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,646 @@ | |||
| import { and, asc, gte, lt, lte, sql } from "drizzle-orm"; | |||
There was a problem hiding this comment.
The
configVersion filter uses a raw sql template tag instead of Drizzle's eq() helper, which is the conventional pattern for equality conditions in this codebase. Both produce parameterized queries so there is no functional difference, but using eq() is more consistent.
| import { and, asc, gte, lt, lte, sql } from "drizzle-orm"; | |
| import { and, asc, eq, gte, lt, lte, sql } from "drizzle-orm"; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/hourly-rollups.ts
Line: 1
Comment:
The `configVersion` filter uses a raw `sql` template tag instead of Drizzle's `eq()` helper, which is the conventional pattern for equality conditions in this codebase. Both produce parameterized queries so there is no functional difference, but using `eq()` is more consistent.
```suggestion
import { and, asc, eq, gte, lt, lte, sql } from "drizzle-orm";
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| lt(publicStatusHourlyRollups.bucketStart, input.end), | ||
| ]; | ||
| if (input.configVersion) { | ||
| conditions.push(sql`${publicStatusHourlyRollups.configVersion} = ${input.configVersion}`); |
There was a problem hiding this comment.
The
configVersion equality condition is expressed via a raw sql template rather than the eq() helper. Using eq() is consistent with the rest of the filter conditions.
| conditions.push(sql`${publicStatusHourlyRollups.configVersion} = ${input.configVersion}`); | |
| conditions.push(eq(publicStatusHourlyRollups.configVersion, input.configVersion)); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/hourly-rollups.ts
Line: 351
Comment:
The `configVersion` equality condition is expressed via a raw `sql` template rather than the `eq()` helper. Using `eq()` is consistent with the rest of the filter conditions.
```suggestion
conditions.push(eq(publicStatusHourlyRollups.configVersion, input.configVersion));
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/public-status/rebuild-worker.ts (1)
274-326:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win修复
hourStart传参的语义不匹配虽然
buildAndPersistPublicStatusHourlyRollup内部会调用alignHourStartUtc(input.hourStart)来对齐参数(第 289 行),但调用方在循环中传入的cursorMs(如09:15Z)并未显式对齐到整点。这造成语义混淆:参数名为hourStart暗示应接收整点时间,而实际传入的是基于intervalMinutes边界的非整点值。虽然内部对齐机制防止了数据损坏,但建议显式传入整点对齐的值,以遵守参数的语义约定:
建议修复
const manifestCoveredFrom = new Date( Date.parse(coveredTo) - input.rangeHours * 60 * 60 * 1000 ).toISOString(); - const rollupCoveredFrom = new Date( - Date.parse(coveredTo) - ROLLUP_HISTORY_HOURS * 60 * 60 * 1000 - ).toISOString(); + const currentHourStart = alignHourStartUtc(now); + const rollupCoveredFromMs = currentHourStart.getTime() - ROLLUP_HISTORY_HOURS * 60 * 60 * 1000;const finalizedHourStarts: Date[] = []; for ( - let cursorMs = Date.parse(rollupCoveredFrom); - cursorMs < Date.parse(coveredTo); + let cursorMs = rollupCoveredFromMs; + cursorMs < currentHourStart.getTime(); cursorMs += 60 * 60 * 1000 ) { finalizedHourStarts.push(new Date(cursorMs)); }} - const currentHourStart = alignHourStartUtc(now); const currentHourRequests = await queryPublicStatusRequests({此外,建议在
tests/unit/public-status/rebuild-worker.test.ts中补充intervalMinutes=15且now非整点(如2026-04-21T10:17:00.000Z)的测试用例,以覆盖这一边界情况。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/public-status/rebuild-worker.ts` around lines 274 - 326, 循环中用来生成 finalizedHourStarts 的 cursorMs 可能不是整点(例如 09:15Z),导致传入 buildAndPersistPublicStatusHourlyRollup 的 hourStart 语义不明确;虽然该函数内部会调用 alignHourStartUtc,但调用方应显式传入对齐后的整点时间以保持语义一致。修复方法:在构造 finalizedHourStarts 时用 alignHourStartUtc(new Date(cursorMs))(或等效方式)将每个 hourStart 对齐到整点(替换当前直接 new Date(cursorMs) 的做法),继续通过 chunkArray 批量调用 buildAndPersistPublicStatusHourlyRollup 如前;并新增单元测试(tests/unit/public-status/rebuild-worker.test.ts)覆盖 intervalMinutes=15 且 now 为非整点(例如 2026-04-21T10:17:00.000Z)的场景以验证行为。
🧹 Nitpick comments (1)
src/lib/public-status/rebuild-worker.ts (1)
309-313: 💤 Low value循环条件中
Date.parse(coveredTo)每轮重复计算
Date.parse(coveredTo)在每轮迭代都重新解析字符串。即便循环上限只有 720 次也算不上瓶颈,但提到循环外缓存能让意图更清晰,也方便后续按上面建议改成与currentHourStart对齐:建议优化
- for ( - let cursorMs = Date.parse(rollupCoveredFrom); - cursorMs < Date.parse(coveredTo); - cursorMs += 60 * 60 * 1000 - ) { + const rollupFromMs = Date.parse(rollupCoveredFrom); + const rollupToMs = Date.parse(coveredTo); + for (let cursorMs = rollupFromMs; cursorMs < rollupToMs; cursorMs += 60 * 60 * 1000) { finalizedHourStarts.push(new Date(cursorMs)); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/public-status/rebuild-worker.ts` around lines 309 - 313, The loop repeatedly calls Date.parse(coveredTo) each iteration; compute and reuse a parsed value (e.g., const coveredToMs = Date.parse(coveredTo)) before the for loop and change the loop condition to use coveredToMs instead of Date.parse(coveredTo). Also consider aligning the loop bounds to the hourly boundary used elsewhere by using the existing currentHourStart (or deriving an aligned boundary) for the upper limit so cursorMs and currentHourStart are on the same hour boundaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/public-status/hourly-rollups.ts`:
- Around line 52-69: The median function returns inconsistent precision: the
even- count branch rounds to 4 decimal places while the odd- count branch
returns the raw number; update median (function median) to normalize precision
for both cases (e.g., round the odd-case value to 4 decimals the same way as the
even case or use a shared roundTo4 helper) so TPS values (from
computeTokensPerSecond / tps) are stored with consistent precision.
In `@src/lib/public-status/rebuild-worker.ts`:
- Around line 308-326: The loop currently rebuilds every hourly bucket in the
retention window by generating finalizedHourStarts and upserting all of them,
causing heavy DB load; change the logic in rebuild-worker.ts so instead of
iterating Date.parse(rollupCoveredFrom) .. coveredTo you compute only the
most-recent finalized hours plus the current hour (e.g., RECENT_FINALIZED_HOURS
= 2..6), by calculating finalizedFromMs = currentHourStartMs -
RECENT_FINALIZED_HOURS * 60*60*1000 and iterating from finalizedFromMs to
currentHourStartMs (and include current hour if desired), then feed those hour
starts to chunkArray and call buildAndPersistPublicStatusHourlyRollup as before
(keep ROLLUP_WRITE_BATCH_SIZE batching). Also update the unit test that asserts
30*24 calls (tests/unit/public-status/rebuild-worker.test.ts) to match the new
RECENT_FINALIZED_HOURS behavior or add a separate backfill-only code path for
full 30-day runs.
---
Outside diff comments:
In `@src/lib/public-status/rebuild-worker.ts`:
- Around line 274-326: 循环中用来生成 finalizedHourStarts 的 cursorMs 可能不是整点(例如
09:15Z),导致传入 buildAndPersistPublicStatusHourlyRollup 的 hourStart
语义不明确;虽然该函数内部会调用 alignHourStartUtc,但调用方应显式传入对齐后的整点时间以保持语义一致。修复方法:在构造
finalizedHourStarts 时用 alignHourStartUtc(new Date(cursorMs))(或等效方式)将每个 hourStart
对齐到整点(替换当前直接 new Date(cursorMs) 的做法),继续通过 chunkArray 批量调用
buildAndPersistPublicStatusHourlyRollup
如前;并新增单元测试(tests/unit/public-status/rebuild-worker.test.ts)覆盖 intervalMinutes=15
且 now 为非整点(例如 2026-04-21T10:17:00.000Z)的场景以验证行为。
---
Nitpick comments:
In `@src/lib/public-status/rebuild-worker.ts`:
- Around line 309-313: The loop repeatedly calls Date.parse(coveredTo) each
iteration; compute and reuse a parsed value (e.g., const coveredToMs =
Date.parse(coveredTo)) before the for loop and change the loop condition to use
coveredToMs instead of Date.parse(coveredTo). Also consider aligning the loop
bounds to the hourly boundary used elsewhere by using the existing
currentHourStart (or deriving an aligned boundary) for the upper limit so
cursorMs and currentHourStart are on the same hour boundaries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b9fe64d6-c5bf-485d-ba21-1affb01966c6
📒 Files selected for processing (12)
drizzle/0103_public_status_hourly_rollups.sqldrizzle/meta/0103_snapshot.jsondrizzle/meta/_journal.jsonsrc/app/api/public-status/route.tssrc/drizzle/schema.tssrc/lib/public-status/hourly-rollups.tssrc/lib/public-status/read-store.tssrc/lib/public-status/rebuild-worker.tstests/integration/public-status/route-redis-only.test.tstests/unit/public-status/hourly-rollups.test.tstests/unit/public-status/read-store.test.tstests/unit/public-status/rebuild-worker.test.ts
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/unit/public-status/rebuild-worker.test.ts (2)
175-178: ⚡ Quick win
upsertPublicStatusHourlyRollups的正向调用路径(非空行数据)在此缺少覆盖当前测试通过将
mockBuildPublicStatusHourlyRollupsFromRequests全程设置为返回[]来验证"空数据不触发 upsert"的优化路径(第 175 行与第 498 行),但 始终未验证当 build 函数返回非空rows时,rebuild-worker 是否确实调用了upsertPublicStatusHourlyRollups。本 PR 的核心目标正是将汇总数据持久化到 PostgreSQL,而唯一涉及 DB 写入的代码路径未被任何 rebuild-worker 层面的测试用例覆盖。建议补充一个测试用例,令
mockBuildPublicStatusHourlyRollupsFromRequests在若干特定 hour 返回非空行,并断言mockUpsertPublicStatusHourlyRollups被调用且传入了正确的行数据,例如:+ it("calls upsertPublicStatusHourlyRollups with non-empty rollup rows during rebuild", async () => { + const mod = await importRebuildWorkerModule(); + + const fakeRow = { configVersion: "cfg-1", hourStart: "2026-04-21T09:00:00.000Z" }; + mockBuildPublicStatusHourlyRollupsFromRequests.mockImplementation(({ hourStart }) => { + const h = hourStart instanceof Date ? hourStart : new Date(hourStart); + return h.getUTCHours() === 9 ? [fakeRow] : []; + }); + + mockReadCurrentInternalPublicStatusConfigSnapshot.mockResolvedValue({ + configVersion: "cfg-1", + generatedAt: "2026-04-21T10:00:00.000Z", + siteTitle: "Claude Code Hub Status", + siteDescription: "Request-derived public status", + defaultIntervalMinutes: 5, + defaultRangeHours: 24, + groups: [], + }); + mockQueryPublicStatusRequests.mockResolvedValue([]); + mockRedisSet.mockResolvedValue("OK"); + + await mod.rebuildPublicStatusProjection({ + intervalMinutes: 5, + rangeHours: 24, + now: new Date("2026-04-21T10:02:00.000Z"), + }); + + expect(mockUpsertPublicStatusHourlyRollups).toHaveBeenCalledWith( + expect.arrayContaining([fakeRow]) + ); + });Also applies to: 485-498
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/public-status/rebuild-worker.test.ts` around lines 175 - 178, Add a positive-path unit test in rebuild-worker.test.ts that configures mockBuildPublicStatusHourlyRollupsFromRequests to return non-empty rows for one or more hours and then asserts that mockUpsertPublicStatusHourlyRollups is called exactly once (or expected times) with those same rows; make sure to also assert any related calls (e.g., mockWriteCurrentHourPublicStatusSummary or mockCleanupPublicStatusHourlyRollups) behave as expected in this scenario, and restore/clear mocks between tests so the new case doesn't interfere with the existing empty-row test that returns [].
102-114: 💤 Low value
alignHourStartUtc在 mock 中被重新实现,存在与真实逻辑漂移的风险第 104–109 行在 mock 中完整复制了
alignHourStartUtc的逻辑。如果src/lib/public-status/hourly-rollups.ts中的真实实现(例如时区处理或边界条件)日后发生变化,此处的 mock 不会感知,测试断言会静默地验证错误行为。建议将 mock 中的
alignHourStartUtc替换为透传到真实模块的方式(部分 mock + 保留真实函数),或至少添加注释声明它与真实实现保持同步的意图,以降低维护负担。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/public-status/rebuild-worker.test.ts` around lines 102 - 114, The mock reimplements alignHourStartUtc which risks drifting from the real implementation; change the mock to reuse the real alignHourStartUtc instead of copying logic (e.g., in your vi.doMock return, call vi.importActual to obtain the real module and pass through alignHourStartUtc or only partially mock other exports), or if you must keep a local stub, add a clear comment above alignHourStartUtc stating it must mirror src/lib/public-status/hourly-rollups.ts and include a test to detect divergence; reference the mocked function name alignHourStartUtc and the mocked module buildPublicStatusHourlyRollupsFromRequests/cleanupPublicStatusHourlyRollups/upsertPublicStatusHourlyRollups/writeCurrentHourPublicStatusSummary so you only override those while preserving the real alignHourStartUtc.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/public-status/rebuild-worker.test.ts`:
- Around line 175-178: Add a positive-path unit test in rebuild-worker.test.ts
that configures mockBuildPublicStatusHourlyRollupsFromRequests to return
non-empty rows for one or more hours and then asserts that
mockUpsertPublicStatusHourlyRollups is called exactly once (or expected times)
with those same rows; make sure to also assert any related calls (e.g.,
mockWriteCurrentHourPublicStatusSummary or mockCleanupPublicStatusHourlyRollups)
behave as expected in this scenario, and restore/clear mocks between tests so
the new case doesn't interfere with the existing empty-row test that returns [].
- Around line 102-114: The mock reimplements alignHourStartUtc which risks
drifting from the real implementation; change the mock to reuse the real
alignHourStartUtc instead of copying logic (e.g., in your vi.doMock return, call
vi.importActual to obtain the real module and pass through alignHourStartUtc or
only partially mock other exports), or if you must keep a local stub, add a
clear comment above alignHourStartUtc stating it must mirror
src/lib/public-status/hourly-rollups.ts and include a test to detect divergence;
reference the mocked function name alignHourStartUtc and the mocked module
buildPublicStatusHourlyRollupsFromRequests/cleanupPublicStatusHourlyRollups/upsertPublicStatusHourlyRollups/writeCurrentHourPublicStatusSummary
so you only override those while preserving the real alignHourStartUtc.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d16ac930-14a7-4386-a2ee-766b4c937364
📒 Files selected for processing (5)
src/drizzle/schema.tssrc/lib/public-status/hourly-rollups.tssrc/lib/public-status/rebuild-worker.tstests/unit/public-status/hourly-rollups.test.tstests/unit/public-status/rebuild-worker.test.ts
✅ Files skipped from review due to trivial changes (1)
- tests/unit/public-status/hourly-rollups.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/public-status/hourly-rollups.ts
- src/lib/public-status/rebuild-worker.ts
ced80b6 to
c71aeb3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c71aeb3c4a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| })); | ||
|
|
||
| return { | ||
| rebuildState: input.rows.length > 0 ? "fresh" : "no-data", |
There was a problem hiding this comment.
Mark stale DB rollup responses before returning
When any historical row exists, this function always emits rebuildState: "fresh", so readPublicStatusPayload returns early and never triggers a rebuild hint even if the current-hour cache is missing or the newest rollup is old. In practice, if the rebuild worker stalls (or current-hour Redis TTL expires), clients can receive hours-old data as ready/fresh indefinitely instead of stale/rebuilding, which hides data freshness regressions and suppresses recovery signaling.
Useful? React with 👍 / 👎.
Summary
This PR reduces public-status Redis memory growth by moving finalized public status history out of Redis and into persisted hourly database rollups.
Problem
The public-status feature was storing 30 days of hourly history in Redis using long-lived keys (
public-status:v1:snapshot:*andpublic-status:v1:series:*). This caused:Solution
Move finalized hourly summaries from Redis to PostgreSQL with a 30-day retention window, keeping only the current incomplete hour in Redis with a short TTL.
Changes
Core Changes
public_status_hourly_rollupstable with proper indexes for 30-day finalized hourly history/api/public-statusresponse shape and legacy Redis snapshot fallback for rollout safetySupporting Changes
drizzle/0103_public_status_hourly_rollups.sqlwith table creation and indexesRedis Usage Changes
Redis remains in use for:
Redis is no longer the long-term storage location for 30-day public status history.
Compatibility Notes
This branch is based on
upstream/devand intentionally excludes local fork work, development mock routes, and status page UI changes. The PR only contains the public-status storage root fix, migration metadata, and focused tests.Migration
This PR includes Drizzle migration
0103which will be auto-applied ifAUTO_MIGRATE=true:Testing
Automated Tests
Verification Commands
Checklist
Enhanced by Claude AI
Greptile Summary
This PR moves 30-day public-status history out of Redis and into a new
public_status_hourly_rollupsPostgreSQL table, keeping only the current incomplete hour in Redis with a short TTL. The rebuild worker is updated to skip already-finalized hours (checking the DB first), and the read path is updated to serve the payload from DB rollups + a Redis current-hour cache before falling back to the legacy manifest/snapshot path.public_status_hourly_rollupswith a unique constraint on(bucket_start, public_group_slug, public_model_key, request_type_badge)and supporting indexes; the cleanup DELETE filters onbucket_endbut nobucket_endindex exists.findHoursNeedingRollupRefreshskips hours that already have all expected rollup rows, only unconditionally re-querying the last 2 finalized hours for stragglers.configVersionfrom the public snapshot but groups from the internal snapshot; the rebuild worker writes with the internal version, creating a window where DB and Redis lookups return empty results if the two snapshots disagree.Confidence Score: 3/5
The new DB rollup read path silently returns empty results whenever the public and internal config snapshot versions differ, falling back to legacy Redis behavior and making the migration's primary goal unreliable in production.
The config version split between the read path (public snapshot) and the write path (internal snapshot) means the primary new code path silently returns zero rows whenever the two snapshots have different versions. This is a structural issue in a critical hot path that runs on every API request.
src/app/api/public-status/route.ts and src/lib/public-status/read-store.ts — the configVersion threading between public and internal snapshots needs reconciliation before the rollup path can be relied upon.
Important Files Changed
bucket_endindex means cleanup runs a sequential scan on every rebuild cyclefindHoursNeedingRollupRefreshto skip already-finalized hours and only re-scans the last 2 finalized hours, substantially reducing prior full-720-hour overheadpublic_status_hourly_rollupstable with correct indexes for read queries; missing index onbucket_endwhich is used in the cleanup DELETEpublic_status_hourly_rollupsmatching the SQL migration; schema and indexes look consistentSequence Diagram
sequenceDiagram participant Route as GET /api/public-status participant RS as read-store participant DB as PostgreSQL hourly_rollups participant Redis as Redis current-hour cache participant RW as rebuild-worker Route->>RS: readPublicStatusPayload(configVersion=public_v, configSnapshot=internal) RS->>DB: readPublicStatusHourlyRollups(configVersion=public_v) DB-->>RS: historical rows (0 if version mismatch) RS->>Redis: readCurrentHourPublicStatusSummary(configVersion=public_v) Redis-->>RS: current-hour rows (miss if version mismatch) alt rows found RS-->>Route: buildPublicStatusPayloadFromHourlyRollups() else no rows RS->>Redis: read manifest/snapshot RS-->>Route: legacy payload or rebuilding end Note over RW: On rebuild cycle RW->>DB: findHoursNeedingRollupRefresh(configVersion=internal_v) DB-->>RW: missing/recent hours RW->>DB: upsertPublicStatusHourlyRollups(configVersion=internal_v) RW->>Redis: writeCurrentHourPublicStatusSummary(configVersion=internal_v) RW->>DB: cleanupPublicStatusHourlyRollups DELETE WHERE bucket_end <= cutoffComments Outside Diff (3)
src/lib/public-status/rebuild-worker.ts, line 299-330 (link)REBUILD_LOCK_TTL_MSis 60 s, but processing 720 hourly buckets (30 days × 24) in batches of 6 means 120 sequentialPromise.allrounds. If each DB round-trip takes ≥ 0.5 s on the first run, the lock expires while work is still in progress. Another instance then acquires the lock and begins a competing rebuild. Because theinFlightRebuildsmap is per-process, it provides no protection across instances — both would query the requests table for all 720 hours simultaneously, multiplying DB load. The Lua release script safely prevents lock collision, but the concurrent re-scan is the real issue. Consider setting the TTL to at leastROLLUP_WRITE_BATCH_SIZE × expected-batch-latency-ms × (ROLLUP_HISTORY_HOURS / ROLLUP_WRITE_BATCH_SIZE)or extending the lock withKEEPTTL/periodic refresh during the loop.Prompt To Fix With AI
src/app/api/public-status/route.ts, line 17-43 (link)The route passes
configVersion: configSnapshot?.configVersion(the PUBLIC snapshot's version) intoreadPublicStatusPayload, which forwards it to bothreadPublicStatusHourlyRollups(as a WHERE filter) andreadCurrentHourPublicStatusSummary(as part of the Redis key). However, the rebuild worker writes rollups and the current-hour Redis key usingconfigSnapshot.configVersionfromreadCurrentInternalPublicStatusConfigSnapshot(the INTERNAL snapshot's version). If the public and internal config versions ever differ, every historical DB query returns zero rows and the current-hour Redis lookup misses its key — causing the DB path to silently fall through to the Redis manifest path on every request, making the new rollup layer non-functional. The integration test inroute-redis-only.test.tseven uses distinct values ("cfg-public"vs"cfg-internal") and only verifies call parameters, not the end-to-end data flow, so this mismatch is not caught by tests.Prompt To Fix With AI
src/app/api/public-status/route.ts, line 17-43 (link)The route passes
configVersion: configSnapshot?.configVersion(the PUBLIC snapshot's version) intoreadPublicStatusPayload, which forwards it to bothreadPublicStatusHourlyRollups(as a WHERE filter) andreadCurrentHourPublicStatusSummary(as part of the Redis key). However, the rebuild worker writes rollups and the current-hour Redis key usingconfigSnapshot.configVersionfromreadCurrentInternalPublicStatusConfigSnapshot(the INTERNAL snapshot's version). If the public and internal config versions ever differ, every historical DB query returns zero rows and the current-hour Redis lookup misses — causing the new rollup layer to be non-functional and silently falling through to the legacy Redis snapshot path on every request.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "fix(public-status): address rollup revie..." | Re-trigger Greptile