fix: disabled-key 429 lockout, slow availability page, public-status Redis leak#1187
Conversation
Disabled keys hit the proxy pre-auth rate limiter on every request, incremented the failure counter, and locked the IP/key out with HTTP 429 "Too many authentication failures" after 20 attempts. The legitimate "key disabled" 401 was masked forever once the lockout tripped. Introduce `resolveApiKeyAuthOutcome` returning a discriminated union (`not_found` / `key_disabled` / `key_expired` / ok), map each reason to its own 401 error, and only feed `credentials` failures to the rate-limiter — admin-disabled or expired keys/users now bypass it entirely. `validateApiKeyAndGetUser` stays as a backwards-compatible wrapper. The /v1/models chain is updated the same way. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The availability dashboard query was slow because the WHERE clause called the PL/pgSQL function `fn_is_message_request_finalized(...)`. PostgreSQL never inlines PL/pgSQL functions, so the predicate became opaque to the planner and the partial index `idx_message_request_provider_created_at_finalized_active` (predicate `status_code IS NOT NULL AND deleted_at IS NULL`) was no longer usable — the dashboard fell back to a sequential scan that re-evaluated the function per row. Inline a semantically-equivalent SQL expression so `status_code IS NOT NULL` becomes the dominant SARGable branch. The SQL function definition is unchanged (still called by the upsert trigger on the write path); a header comment marks the keep-in-sync requirement. Tests assert on the inlined form to guard against the slow function-call form returning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough该 PR 引入结构化 API 密钥鉴权结果(resolveApiKeyAuthOutcome)并在代理守卫与模型端点按失败类型区分是否计入暴力破解限流;将 availability 的 finalized 判定内联为可索引的 SQL 谓词;为 public-status 快照写入引入 Redis TTL 并更新相关测试与接口提示。 ChangesAPI 密钥认证结果 API 与限流集成
数据库查询优化
Public-status Redis TTL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5d7449d87
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .from(keys) | ||
| .innerJoin(users, eq(keys.userId, users.id)) | ||
| .where( | ||
| and( | ||
| eq(keys.key, keyString), | ||
| isNull(keys.deletedAt), | ||
| eq(keys.isEnabled, true), | ||
| or(isNull(keys.expiresAt), gt(keys.expiresAt, new Date())), | ||
| isNull(users.deletedAt) | ||
| ) | ||
| ); | ||
| .where(and(eq(keys.key, keyString), isNull(keys.deletedAt), isNull(users.deletedAt))); |
There was a problem hiding this comment.
Make auth lookup deterministic when duplicate key rows exist
Because this query now returns all non-deleted rows for a key string and later uses result[0], authentication outcome depends on arbitrary row order when duplicate keys.key values exist (the schema has no unique constraint on keys.key). A disabled/expired duplicate can be picked before an active duplicate, causing valid credentials to be rejected as key_disabled/key_expired. Before this change, the SQL filter only returned active rows, so this mixed-state false rejection path was not present.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
Code Review
This pull request refactors the API key authentication logic by introducing resolveApiKeyAuthOutcome, which distinguishes between missing, disabled, and expired keys. This change allows the ProxyAuthenticator to skip rate-limiting for administrative account-state failures, preventing potential lockouts. Additionally, the availability-service.ts was optimized by inlining SQL logic to improve query planning and index utilization. Feedback suggests extracting hardcoded status reasons into a shared constant to mitigate maintenance risks.
| function buildAvailabilityFinalizedCondition() { | ||
| return sql`fn_is_message_request_finalized( | ||
| ${messageRequest.blockedBy}, | ||
| ${messageRequest.statusCode}, | ||
| ${messageRequest.providerChain}, | ||
| ${messageRequest.errorMessage} | ||
| // The `IS NOT NULL` checks below are individually SARGable. Listing | ||
| // status_code first encourages the planner to scan the partial index. | ||
| return sql`( | ||
| ${messageRequest.statusCode} IS NOT NULL | ||
| OR ${messageRequest.blockedBy} IS NOT NULL | ||
| OR COALESCE(${messageRequest.errorMessage}, '') <> '' | ||
| OR ( | ||
| ${messageRequest.providerChain} IS NOT NULL | ||
| AND jsonb_typeof(${messageRequest.providerChain}) = 'array' | ||
| AND jsonb_array_length(${messageRequest.providerChain}) > 0 | ||
| AND jsonb_typeof(${messageRequest.providerChain} -> -1) = 'object' | ||
| AND ( | ||
| (${messageRequest.providerChain} -> -1 ->> 'reason') IN ( | ||
| 'request_success', 'retry_success', 'retry_failed', 'system_error', | ||
| 'resource_not_found', 'client_error_non_retryable', | ||
| 'concurrent_limit_failed', 'hedge_winner', 'hedge_loser_cancelled', | ||
| 'client_abort' | ||
| ) | ||
| OR ( | ||
| (${messageRequest.providerChain} -> -1 ? 'statusCode') | ||
| AND jsonb_typeof(${messageRequest.providerChain} -> -1 -> 'statusCode') = 'number' | ||
| ) | ||
| OR COALESCE(${messageRequest.providerChain} -> -1 ->> 'errorMessage', '') <> '' | ||
| ) | ||
| ) | ||
| )`; | ||
| } |
There was a problem hiding this comment.
The inlining of the fn_is_message_request_finalized logic into this SQL template is a significant performance improvement as it allows the PostgreSQL planner to utilize partial indexes. However, this creates a maintenance risk as the logic is now duplicated between the TypeScript code and the PL/pgSQL function in the database.
Consider extracting the list of finalized reasons (lines 97-100) into a shared constant at the top of the file to improve readability and ensure consistency if these reasons are used elsewhere in the service.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/v1/_lib/models/available-models.ts`:
- Around line 66-67: The thrown error responses in available-models.ts (the
throw c.json calls that return messages like "API 密钥已被禁用" and the other messages
around the same block at lines ~69-73) must stop returning hardcoded user-facing
strings and instead return i18n keys; update the code that throws these errors
(the throw c.json(...) statements) to use defined translation keys (e.g.,
models.key_disabled, models.invalid_key, etc.) and ensure the response payload
carries the key (and optional interpolation data) so the central translation
layer will render the localized text for zh-CN/zh-TW/en/ja/ru before sending to
the client; keep the error structure ({ error: { messageKey: "...", type: "..."
} } or follow the project’s established i18n error contract) and replace all
hardcoded messages in that block (including the ones referenced at 69-73)
accordingly.
In `@src/app/v1/_lib/proxy/auth-guard.ts`:
- Around line 188-191: The 401 error message passed to ProxyResponses.buildError
is hardcoded; replace literal user-facing strings in auth-guard.ts (the
ProxyResponses.buildError calls for the invalid API key / disabled / expired
branches) with i18n keys (e.g. use keys for "invalid_api_key", "key_disabled",
"key_expired") and pass those keys into the response payload so the translation
layer renders the localized text; update the three affected calls (the
buildError invocations around the invalid_api_key, key_disabled, and key_expired
branches) and ensure corresponding entries are added to the i18n resources for
zh-CN, zh-TW, en, ja, and ru.
In `@src/lib/availability/availability-service.ts`:
- Around line 91-106: The SQL predicate using ${messageRequest.providerChain}
relies on short-circuiting and can still call jsonb_array_length/jsonb_typeof on
non-array values causing query errors; change the WHERE fragment to use a CASE
(or a NULLIF/WHEN) guard that first checks
jsonb_typeof(${messageRequest.providerChain}) = 'array' and only then evaluates
jsonb_array_length(...) and the -> -1 expressions so evaluation order is forced,
and add a regression test feeding a non-array provider_chain (object, scalar,
and JSON null) to the code path that builds this query to assert it no longer
throws a 500.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 979c59c4-0c49-4384-bf62-b28339970251
📒 Files selected for processing (12)
src/app/v1/_lib/models/available-models.tssrc/app/v1/_lib/proxy/auth-guard.tssrc/app/v1/_lib/proxy/session.tssrc/lib/availability/availability-service.tssrc/repository/index.tssrc/repository/key.tstests/unit/lib/availability-service.test.tstests/unit/models/available-models-gemini-key.test.tstests/unit/proxy/auth-guard-account-state.test.tstests/unit/proxy/auth-guard-precheck.test.tstests/unit/proxy/available-models.test.tstests/unit/repository/key-resolve-outcome.test.ts
| export interface AuthState { | ||
| user: User | null; | ||
| key: Key | null; | ||
| apiKey: string | null; | ||
| success: boolean; | ||
| errorResponse?: Response; // 认证失败时的详细错误响应 | ||
| /** | ||
| * Set when `success` is false. Determines whether the proxy auth guard | ||
| * records the failure against the IP/key rate-limiter. | ||
| */ | ||
| failureKind?: AuthFailureKind; | ||
| } |
There was a problem hiding this comment.
failureKind should be required on failure states
AuthState uses a flat success: boolean shape, so TypeScript does not enforce that failureKind is always present when success is false. The guard check authState.failureKind !== "account_state" silently treats undefined as a credentials failure — the safe default, but only by accident. Any future failure branch that forgets the field will record against the rate-limiter without any compile-time warning. Modeling this as a discriminated union would make the constraint explicit and prevent the footgun entirely.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/session.ts
Line: 47-58
Comment:
**`failureKind` should be required on failure states**
`AuthState` uses a flat `success: boolean` shape, so TypeScript does not enforce that `failureKind` is always present when `success` is `false`. The guard check `authState.failureKind !== "account_state"` silently treats `undefined` as a `credentials` failure — the safe default, but only by accident. Any future failure branch that forgets the field will record against the rate-limiter without any compile-time warning. Modeling this as a discriminated union would make the constraint explicit and prevent the footgun entirely.
How can I resolve this? If you propose a fix, please make it concise.| // outcome.reason === "key_expired" | ||
| logger.warn("[ProxyAuthenticator] API key has expired", { | ||
| apiKeyLength: apiKey.length, | ||
| fromHeader: !!headers.authHeader || !!headers.apiKeyHeader || !!headers.geminiApiKeyHeader, | ||
| fromQuery: !!headers.geminiApiKeyQuery, | ||
| }); | ||
| return { | ||
| user: null, | ||
| key: null, | ||
| apiKey, | ||
| success: false, | ||
| failureKind: "account_state", | ||
| errorResponse: ProxyResponses.buildError( | ||
| 401, | ||
| "API 密钥无效。提供的密钥不存在、已被删除、已被禁用或已过期。", | ||
| "invalid_api_key" | ||
| "API 密钥已过期。请联系管理员续期或更换密钥。", | ||
| "key_expired" | ||
| ), | ||
| }; |
There was a problem hiding this comment.
Non-exhaustive fallthrough on
outcome.reason
The comment // outcome.reason === "key_expired" is correct today because the union has exactly three members, but there is no compile-time exhaustiveness check. If a new ApiKeyAuthFailureReason is ever added (e.g., "key_revoked"), the new variant silently falls into the key_expired handler, sends the wrong error message, and — critically — classifies the failure as account_state when it might need to be credentials. A switch with an assertNever exhaustiveness guard, or an explicit else branch that asserts the type, would surface this at build time instead.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/auth-guard.ts
Line: 214-229
Comment:
**Non-exhaustive fallthrough on `outcome.reason`**
The comment `// outcome.reason === "key_expired"` is correct today because the union has exactly three members, but there is no compile-time exhaustiveness check. If a new `ApiKeyAuthFailureReason` is ever added (e.g., `"key_revoked"`), the new variant silently falls into the `key_expired` handler, sends the wrong error message, and — critically — classifies the failure as `account_state` when it might need to be `credentials`. A `switch` with an `assertNever` exhaustiveness guard, or an explicit `else` branch that asserts the type, would surface this at build time instead.
How can I resolve this? If you propose a fix, please make it concise.| OR ( | ||
| (${messageRequest.providerChain} -> -1 ? 'statusCode') | ||
| AND jsonb_typeof(${messageRequest.providerChain} -> -1 -> 'statusCode') = 'number' | ||
| ) |
There was a problem hiding this comment.
Raw
? JSONB operator in Drizzle sql template
(${messageRequest.providerChain} -> -1 ? 'statusCode') uses the PostgreSQL JSONB key-existence operator ?. With the pg driver (which parameterizes via $N) the raw ? passes through to the server unchanged and works correctly. However, some connection-pooling middlewares or alternative drivers (e.g., postgres.js) treat bare ? as a positional-parameter placeholder and will mangle the query. Consider adding a brief comment documenting the driver assumption here, in case the underlying driver is ever swapped.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/availability/availability-service.ts
Line: 102-105
Comment:
**Raw `?` JSONB operator in Drizzle `sql` template**
`(${messageRequest.providerChain} -> -1 ? 'statusCode')` uses the PostgreSQL JSONB key-existence operator `?`. With the `pg` driver (which parameterizes via `$N`) the raw `?` passes through to the server unchanged and works correctly. However, some connection-pooling middlewares or alternative drivers (e.g., `postgres.js`) treat bare `?` as a positional-parameter placeholder and will mangle the query. Consider adding a brief comment documenting the driver assumption here, in case the underlying driver is ever swapped.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Code Review Summary
This PR fixes two distinct production issues: a 429 lockout caused by disabled API keys and a slow availability dashboard due to a non-inlinable PL/pgSQL predicate. Both fixes are well-reasoned, thoroughly tested, and the auth discriminated-union refactor correctly isolates brute-force signals from admin actions.
PR Size: L
- Lines changed: 744 (685 additions, 59 deletions)
- Files changed: 12
- Split suggestion: This PR contains two independent fixes that could ship separately: (1) the auth-chain rate-limiter routing (7 files, ~300 lines), and (2) the availability predicate inlining (2 files, ~100 lines). Splitting would make rollback safer if either fix causes unexpected issues.
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 1 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 1 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
High Priority Issues (Should Fix)
1. Non-exhaustive ApiKeyAuthFailureReason handling in auth-guard.ts
File: src/app/v1/_lib/proxy/auth-guard.ts:214
Confidence: 95/100
The sequential if statements handling outcome.reason lack a compile-time exhaustiveness guard. If ApiKeyAuthFailureReason is ever expanded (e.g., adding "key_revoked"), the new variant silently falls through to the key_expired handler. This misclassifies the failure as account_state (bypassing the rate limiter) and returns the wrong error message. The same pattern exists in available-models.ts (line 71) but with less severe consequences since it falls through to invalid_api_key (credentials class).
Suggested fix: Convert to an if/else if chain with a never exhaustiveness guard:
if (\!outcome.ok) {
if (outcome.reason === "not_found") {
// ... existing handler
} else if (outcome.reason === "key_disabled") {
// ... existing handler
} else if (outcome.reason === "key_expired") {
// ... existing handler
} else {
const _exhaustiveCheck: never = outcome.reason;
throw new Error(`Unhandled auth failure reason: ${_exhaustiveCheck}`);
}
}Review Coverage
- Logic and correctness - One exhaustiveness fragility identified
- Security (OWASP Top 10) - Clean; auth failure classification is sound
- Error handling - Clean; all failure paths log and return precise errors
- Type safety - One type-union exhaustiveness gap in auth-guard.ts
- Documentation accuracy - Clean; excellent comments on inlined SQL predicate
- Test coverage - Adequate; 13 new unit tests cover all auth outcomes + regression
- Code clarity - Good; discriminated union refactor improves readability
Automated review by Claude AI
|
|
||
| if (!authResult) { | ||
| logger.debug("[ProxyAuthenticator] API key validation failed", { | ||
| // outcome.reason === "key_expired" |
There was a problem hiding this comment.
[High] [LOGIC-BUG] Non-exhaustive ApiKeyAuthFailureReason handling
Why this is a problem: The sequential if statements for outcome.reason lack a compile-time exhaustiveness guard. If ApiKeyAuthFailureReason is ever expanded (e.g., adding "key_revoked"), the new variant silently falls through to the key_expired handler below. This returns the wrong error message and misclassifies the failure as account_state, bypassing the brute-force rate limiter.
Suggested fix: Convert to if/else if with a never exhaustiveness guard in the final else branch.
- key.ts: classify duplicate-row matches deterministically. The relaxed WHERE clause can return multiple non-deleted rows for one key string (no unique constraint on keys.key) and result[0] was non-deterministic; prefer an active row, fall back to "any enabled = key_expired", else key_disabled. (chatgpt-codex P1) - availability-service.ts: wrap the provider_chain jsonb branch in a CASE expression so jsonb_array_length cannot run on a non-array row. PostgreSQL does not guarantee AND short-circuit, so a single non-array historical row would otherwise crash the dashboard query. Extract the finalized provider_chain reason list into FINALIZED_PROVIDER_CHAIN_REASONS and document the JSONB `?` operator's driver assumption. (coderabbit P2, gemini, greptile) - auth-guard.ts / available-models.ts: convert the outcome.reason branch to an exhaustive switch with assertNever, and introduce a buildAuthFailure factory so every failure path is forced to tag its failureKind at compile time. Adding a new ApiKeyAuthFailureReason now produces a TypeScript error until the new branch is handled. (greptile P2 ×2) - Tests cover the duplicate-row cases (ok / key_expired / key_disabled across mixed-state rows) and assert the CASE guard appears in the generated SQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed review feedback in d314561: Bugs fixed
Nice-to-have
Deferred (with reason)
Local pre-push: build ✓, typecheck ✓, lint ✓, all 5960 unit tests ✓. |
🧪 测试结果
总体结果: ✅ 所有测试通过 |
| const outcome = await resolveApiKeyAuthOutcome(apiKey); | ||
| if (!outcome.ok) { | ||
| // Exhaustive switch: see auth-guard.ts for rationale. Adding a new | ||
| // ApiKeyAuthFailureReason will produce a TypeScript error on the |
There was a problem hiding this comment.
[HIGH] [TEST-MISSING-CRITICAL] /v1/models key_disabled / key_expired branches have no unit coverage
Why this is a problem: This PR adds new auth behavior in authenticateRequest (distinct 401 key_disabled vs key_expired). There is currently no unit test that asserts these new branches, so a regression back to a generic invalid_api_key (or other behavior) would go undetected. Guideline: 2. **Test Coverage** - All new features must have unit test coverage of at least 80%.
Suggested fix:
// tests/unit/models/available-models-auth-outcome.test.ts
import { describe, expect, it, vi } from "vitest";
vi.mock("@/repository/key", () => ({
resolveApiKeyAuthOutcome: vi.fn(),
}));
function makeCtx(apiKey: string) {
return {
req: {
path: "/v1/models",
header: (name: string) => (name.toLowerCase() === "x-api-key" ? apiKey : undefined),
query: () => undefined,
},
json: (body: unknown, status?: number) =>
new Response(JSON.stringify(body), {
status: status ?? 200,
headers: { "content-type": "application/json" },
}),
} as any;
}
describe("handleAvailableModels auth outcomes", () => {
it("returns 401 key_disabled", async () => {
const { resolveApiKeyAuthOutcome } = await import("@/repository/key");
vi.mocked(resolveApiKeyAuthOutcome).mockResolvedValueOnce({ ok: false, reason: "key_disabled" });
const { handleAvailableModels } = await import("@/app/v1/_lib/models/available-models");
const response = await handleAvailableModels(makeCtx("sk-disabled"));
expect(response.status).toBe(401);
const payload = (await response.json()) as { error: { type: string } };
expect(payload.error.type).toBe("key_disabled");
});
it("returns 401 key_expired", async () => {
const { resolveApiKeyAuthOutcome } = await import("@/repository/key");
vi.mocked(resolveApiKeyAuthOutcome).mockResolvedValueOnce({ ok: false, reason: "key_expired" });
const { handleAvailableModels } = await import("@/app/v1/_lib/models/available-models");
const response = await handleAvailableModels(makeCtx("sk-expired"));
expect(response.status).toBe(401);
const payload = (await response.json()) as { error: { type: string } };
expect(payload.error.type).toBe("key_expired");
});
});| 401, | ||
| "提供了多个冲突的 API 密钥。请仅使用一种认证方式。", | ||
| "authentication_error" | ||
| ), |
There was a problem hiding this comment.
[MEDIUM] [STANDARD-VIOLATION] New auth error messages hardcode user-facing text (bypasses i18n)
Why this is a problem: Guideline: 3. **i18n Required** - All user-facing strings must use i18n (5 languages supported). Never hardcode display text. The new responses embed hardcoded message strings (e.g. "API 密钥无效。提供的密钥不存在或已被删除。"), which cannot be localized and will leak a single-language UX into non-zh locales.
Suggested fix:
// src/app/v1/_lib/proxy/auth-guard.ts
const { getLocale, getTranslations } = await import("next-intl/server");
const locale = await getLocale();
const t = await getTranslations({ locale, namespace: "auth" });
return {
user: null,
key: null,
apiKey,
success: false,
failureKind: "credentials",
errorResponse: ProxyResponses.buildError(
401,
t("errors.apiKeyNotFoundOrDeleted"),
"invalid_api_key"
),
};Add the new keys to messages/{locale}/auth.json under errors (e.g. apiKeyNotFoundOrDeleted, apiKeyDisabled, apiKeyExpired) so all supported locales are covered.
There was a problem hiding this comment.
Code Review Summary
This PR combines two substantial changes (proxy auth error classification and availability dashboard query perf) into a single large review surface.
Two issues need follow-up: the new /v1/models auth outcome branches are not unit-tested, and the new auth error responses add additional hardcoded user-facing strings despite the repo i18n requirement.
PR Size: L
- Lines changed: 744
- Files changed: 12
- Split suggestion: separate the auth-chain changes (key lookup outcome + proxy auth guard + tests) from the availability dashboard perf changes (availability finalized predicate + tests).
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 1 | 0 |
| Tests | 0 | 1 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Critical Issues (Must Fix)
- None.
High Priority Issues (Should Fix)
- [HIGH] [TEST-MISSING-CRITICAL] Missing unit coverage for new /v1/models
key_disabled/key_expiredbranches (src/app/v1/_lib/models/available-models.ts:66). Confidence: 90. - [MEDIUM] [STANDARD-VIOLATION] New auth error messages hardcode user-facing strings instead of i18n (
src/app/v1/_lib/proxy/auth-guard.ts:190). Confidence: 80.
Review Coverage
- Logic and correctness
- Security (OWASP Top 10)
- Error handling
- Type safety
- Documentation accuracy
- Test coverage
- Code clarity
Automated review by Codex AI
There was a problem hiding this comment.
- Identified PR
#1187and applied labelsize/L(744 lines changed across 12 files). - Posted 2 inline review comments on diff lines:
src/app/v1/_lib/models/available-models.ts:66— [HIGH] [TEST-MISSING-CRITICAL] missing unit coverage for newkey_disabled/key_expiredbranches.src/app/v1/_lib/proxy/auth-guard.ts:190— [MEDIUM] [STANDARD-VIOLATION] new hardcoded user-facing auth error messages bypass i18n requirement.
- Submitted the required PR review summary via
gh pr review --comment(includes PR size, split suggestion, issue table, and confidence notes).
publishPublicStatusConfigSnapshot, publishInternalPublicStatusConfigSnapshot, and publishCurrentPublicStatusConfigPointers all wrote keys with bare redis.set(...) and no TTL. Every config-version mint (provider/group/system settings change) created a new versioned snapshot key that never expired — on a busy operator the public-status:v1:config:* and config-internal:* key namespaces grow without bound. Neighbouring projection writers in rebuild-worker.ts already use a 30-day TTL via setWithTtl; only the config publishing path was missed when that pattern was introduced (#1056). Add PUBLIC_STATUS_CONFIG_TTL_SECONDS (30 days, matching GENERATION_PROJECTION_TTL_SECONDS in rebuild-worker.ts), widen the local RedisWriter type to the (key, value, "EX", seconds) ioredis overload, and apply the TTL to all four call sites — including the Lua script used by the pointer publisher so SET ... EX is atomic with the version compare. Each successful publish refreshes the TTL on the live pointer keys, so as long as configs are published at least every 30 days the active pointer never expires while stale versioned snapshots get cleaned up naturally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed Root cause traced to #1056: Bundling this into the existing PR since all three bugs come from the same user report batch and are independent file-scoped fixes. Happy to split into a separate PR if preferred. PR title and body updated accordingly. Pre-push re-run: build ✓, typecheck ✓, lint ✓, 5964 tests ✓. |
🧪 测试结果
总体结果: ✅ 所有测试通过 |
…tcomes CodeRabbit and CI flagged two remaining gaps in the previous fix: - The new 401 messages for invalid_api_key / key_disabled / key_expired were hardcoded Chinese strings, ignoring the project's i18n guideline (5 locales: zh-CN, zh-TW, en, ja, ru). The pre-existing strings in the same files (e.g. user_disabled, user_expired) were already hardcoded before this PR and remain so — see PR scope note — but the new branches should follow the established pattern. - /v1/models had no unit coverage for the new key_disabled / key_expired branches in handleAvailableModels.authenticateRequest. A regression back to a generic invalid_api_key would have gone undetected. Add PROXY_INVALID_API_KEY, PROXY_API_KEY_DISABLED, PROXY_API_KEY_EXPIRED codes to AUTH_ERRORS plus translations across all 5 locales. Wire auth-guard.ts and available-models.ts through getErrorMessageServer + next-intl/server's getLocale — same pattern the neighbouring rate-limit-guard.ts already uses. Add tests/unit/models/available-models-auth-outcome.test.ts covering all five 401 branches (key_disabled, key_expired, not_found, user_disabled, user_expired). Existing auth-guard tests mock next-intl/server + getErrorMessageServer so the unit tests can run outside a Next.js request context, and message assertions pin on the ERROR_CODES key (the localized text lives in messages/<locale>/errors.json). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed [HIGH] TEST-MISSING-CRITICAL — [MEDIUM] STANDARD-VIOLATION — Hardcoded auth error messages The auth-guard test suite mocks Pre-push: build ✓, typecheck ✓, lint ✓, 5969 unit tests ✓. |
| } | ||
|
|
||
| interface RedisWriter { | ||
| // ioredis supports both bare `set(key, value)` and the EX-variant | ||
| // `set(key, value, "EX", seconds)`. Widening the type here lets us pass | ||
| // an explicit TTL on every write — see PUBLIC_STATUS_CONFIG_TTL_SECONDS. | ||
| set(key: string, value: string, mode: "EX", seconds: number): Promise<unknown> | unknown; | ||
| set(key: string, value: string): Promise<unknown> | unknown; | ||
| get?(key: string): Promise<string | null> | string | null; |
There was a problem hiding this comment.
Current-pointer keys will expire on idle deployments
buildPublicStatusConfigSnapshotKey(), buildPublicStatusInternalConfigSnapshotKey(), and buildPublicStatusConfigVersionPointerKey() are current pointer keys — they always point at the latest config and should persist until explicitly overwritten. Applying the same 30-day TTL to them as to the accumulating versioned snapshot keys means that any deployment that goes 30 days without a config change loses its pointer keys, causing readCurrentPublicStatusConfigSnapshot to return null and the public status page to silently show no config data.
The versioned snapshot keys (public-status:v1:config:<version>) are the ones that accumulate and need a TTL; the three pointer keys should either carry no TTL or a much longer one (e.g., 1 year), since they are overwritten atomically on every config publish and never accumulate.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/config-snapshot.ts
Line: 82-90
Comment:
**Current-pointer keys will expire on idle deployments**
`buildPublicStatusConfigSnapshotKey()`, `buildPublicStatusInternalConfigSnapshotKey()`, and `buildPublicStatusConfigVersionPointerKey()` are *current* pointer keys — they always point at the latest config and should persist until explicitly overwritten. Applying the same 30-day TTL to them as to the accumulating versioned snapshot keys means that any deployment that goes 30 days without a config change loses its pointer keys, causing `readCurrentPublicStatusConfigSnapshot` to return `null` and the public status page to silently show no config data.
The versioned snapshot keys (`public-status:v1:config:<version>`) are the ones that accumulate and need a TTL; the three pointer keys should either carry no TTL or a much longer one (e.g., 1 year), since they are overwritten atomically on every config publish and never accumulate.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c983b6402a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| buildPublicStatusConfigSnapshotKey(), | ||
| JSON.stringify({ key, configVersion: snapshot.configVersion }) | ||
| JSON.stringify({ key, configVersion: snapshot.configVersion }), | ||
| "EX", | ||
| PUBLIC_STATUS_CONFIG_TTL_SECONDS |
There was a problem hiding this comment.
Keep public-status current pointers from expiring
Applying EX PUBLIC_STATUS_CONFIG_TTL_SECONDS to the config:current pointer means a stable deployment with no config publishes for 30 days will lose its only pointer to the latest snapshot, after which readCurrentPublicStatusConfigSnapshot falls through to null and endpoints like /api/public-site-meta report projection_missing. The leak fix only needs TTL on versioned snapshot keys; expiring the single “current” pointer key introduces a time-based availability regression.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/models/available-models-auth-outcome.test.ts (1)
157-157: ⚡ Quick win将
as never替换为显式类型或satisfies以保持类型约束Line 157 和 Line 173 中的
as never会绕过 TypeScript 的类型检查,降低测试对resolveApiKeyAuthOutcome返回类型变更的捕获能力。建议改用显式类型或satisfies ApiKeyAuthOutcome来确保 mock 返回值始终与实际类型定义保持同步。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/models/available-models-auth-outcome.test.ts` at line 157, The test currently uses "as never" to cast mocked return values which disables TypeScript checks; instead update the mocks passed to resolveApiKeyAuthOutcome to use an explicit type or the "satisfies ApiKeyAuthOutcome" operator so the returned shape matches ApiKeyAuthOutcome—locate the mock objects around resolveApiKeyAuthOutcome calls (previously cast with "as never") and replace those casts with either a typed annotation or "... } satisfies ApiKeyAuthOutcome" to restore compile-time validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/models/available-models-auth-outcome.test.ts`:
- Line 157: The test currently uses "as never" to cast mocked return values
which disables TypeScript checks; instead update the mocks passed to
resolveApiKeyAuthOutcome to use an explicit type or the "satisfies
ApiKeyAuthOutcome" operator so the returned shape matches
ApiKeyAuthOutcome—locate the mock objects around resolveApiKeyAuthOutcome calls
(previously cast with "as never") and replace those casts with either a typed
annotation or "... } satisfies ApiKeyAuthOutcome" to restore compile-time
validation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9b790b2e-362d-4897-9836-7095e18cd202
📒 Files selected for processing (11)
messages/en/errors.jsonmessages/ja/errors.jsonmessages/ru/errors.jsonmessages/zh-CN/errors.jsonmessages/zh-TW/errors.jsonsrc/app/v1/_lib/models/available-models.tssrc/app/v1/_lib/proxy/auth-guard.tssrc/lib/utils/error-messages.tstests/unit/models/available-models-auth-outcome.test.tstests/unit/proxy/auth-guard-account-state.test.tstests/unit/proxy/auth-guard-precheck.test.ts
✅ Files skipped from review due to trivial changes (5)
- messages/en/errors.json
- messages/zh-TW/errors.json
- messages/zh-CN/errors.json
- messages/ja/errors.json
- messages/ru/errors.json
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/unit/proxy/auth-guard-precheck.test.ts
- src/app/v1/_lib/proxy/auth-guard.ts
- src/app/v1/_lib/models/available-models.ts
- tests/unit/proxy/auth-guard-account-state.test.ts
greptile flagged a P1: applying the same 30-day TTL to the three "current pointer" keys would dark out any deployment that goes longer than the TTL without publishing a new config. Pointer keys don't accumulate — only one entry per pointer name exists, overwritten atomically on every publish — so they MUST persist until explicitly overwritten. Only the versioned snapshot keys (`public-status:v1:config:<version>` and `:config-internal:<version>`) keep the 30-day TTL — those are the ones that accumulate as new config versions are minted. The pointer publisher's Lua script and JS fallback now write bare `SET` without `EX`. Tests assert the split: the versioned write carries `EX <ttl>` and the pointer write is a bare two-arg `set(key, value)` with no TTL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cd75941a4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (redis) { | ||
| await redis.set(key, JSON.stringify(snapshot)); | ||
| // Versioned snapshot key: TTL'd so old versions get cleaned up. | ||
| await redis.set(key, JSON.stringify(snapshot), "EX", PUBLIC_STATUS_CONFIG_TTL_SECONDS); |
There was a problem hiding this comment.
Keep latest public config snapshot from expiring
Applying EX PUBLIC_STATUS_CONFIG_TTL_SECONDS to the versioned config snapshot means the currently referenced snapshot disappears after 30 days if no config republish occurs. In the normal publish flow (publishCurrentPublicStatusConfigProjection), both publish*ConfigSnapshot calls pass setCurrentPointer: false, so reads depend on configVersion + public-status:v1:config:<version>; once that versioned key expires, readCurrentPublicStatusConfigSnapshot returns null and endpoints like /api/public-site-meta fall back to projection_missing until another rebuild/publish path runs.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
Summary
Fixes three user-reported bugs:
095c607e(fix(metrics): align success-rate outcome semantics #1080) replaced the SARGablestatus_code IS NOT NULLfinalized predicate with afn_is_message_request_finalized(...)PL/pgSQL call. PostgreSQL never inlines PL/pgSQL, so the partial indexidx_message_request_provider_created_at_finalized_activewas no longer usable and the query degraded to a sequential scan.b2497a46(feat: add redis-projected public status page #1056) added Redis-backed config snapshots written via bareredis.set(...)(no TTL). Every config-version mint (provider / group / system settings change) created a new versioned snapshot key that never expired —public-status:v1:config:*and the internal variant grew without bound on busy operators. Neighbouring projection writers inrebuild-worker.tswere already using a 30-day TTL; only the config publishing path was missed.Changes
Auth chain (
fix(auth))resolveApiKeyAuthOutcomeinsrc/repository/key.tsreturning a discriminated union (not_found/key_disabled/key_expired/ ok).validateApiKeyAndGetUseris kept as a back-compat wrapper. Duplicate-row handling:keys.keyhas no unique constraint; the classifier prefers an active duplicate, falls back tokey_expiredwhen any duplicate is enabled, elsekey_disabled.ProxyAuthenticator(proxy auth guard) maps each reason to a distinct 401 (invalid_api_key,key_disabled,key_expired,user_disabled,user_expired). The branch is an exhaustiveswitchwithassertNever— adding a newApiKeyAuthFailureReasonproduces a TS error until the new branch is handled.AuthFailureKindtag onAuthState; the guard only feedscredentialsfailures (missing/mismatched/unknown key) into the brute-force rate limiter. Admin-disabled or expired keys/users bypass it entirely. AbuildAuthFailurefactory forces every failure call site to tag bothfailureKindanderrorResponseat compile time./v1/modelschain (available-models.ts) is updated the same way.Availability dashboard (
perf(availability))buildAvailabilityFinalizedConditioninlines a semantically-equivalent SQL expression sostatus_code IS NOT NULLbecomes the dominant SARGable branch and the planner can use the partial index. The PL/pgSQL function definition is unchanged (still called by the upsert trigger on the write path); a header comment marks the keep-in-sync requirement.CASEsojsonb_array_lengthcannot evaluate against a non-array historical row (PG does not guarantee AND short-circuit).FINALIZED_PROVIDER_CHAIN_REASONSis extracted as a top-level constant; a comment documents the?JSONB operator's driver assumption.Public status TTL (
fix(public-status))PUBLIC_STATUS_CONFIG_TTL_SECONDS = 60 * 60 * 24 * 30(matchesGENERATION_PROJECTION_TTL_SECONDSinrebuild-worker.ts).redis.set(...)writes inpublishPublicStatusConfigSnapshot,publishInternalPublicStatusConfigSnapshot, andpublishCurrentPublicStatusConfigPointersnow use(key, value, "EX", seconds). The Lua script in the pointer publisher's eval path appliesSET ... EX <ttl>atomically with the version compare so the TTL refreshes on every successful publish.Test plan
tests/unit/proxy/auth-guard-account-state.test.ts(7 cases — disabled key, expired key, disabled user, expired user, unknown key, missing creds, plus a "25 disabled-key attempts never trip 429" regression case)tests/unit/repository/key-resolve-outcome.test.ts(9 cases including 3 dedicated to multi-row duplicate-key scenarios)tests/unit/lib/availability-service.test.tsguard against the slow function-call form returning and assert the newCASEshapetests/unit/public-status/config-snapshot.test.ts(4 cases —publishPublicStatusConfigSnapshot,publishInternalPublicStatusConfigSnapshot, and both Lua / non-Lua pointer-publish paths each assertEX+PUBLIC_STATUS_CONFIG_TTL_SECONDS)bun run build,bun run lint,bun run typecheck,bun run test(5964 tests pass)key_disabledand no 429 lockout/dashboard/availabilityagainst a populated DB, confirm noticeable speed-up vs. devredis-cli --scan --pattern 'public-status:v1:config:*' | xargs -I {} redis-cli TTL {}and confirm every key has a positive TTL🤖 Generated with Claude Code
Greptile Summary
This PR fixes three distinct production bugs: a brute-force rate-limiter incorrectly recording disabled/expired API key failures (causing 429 lockouts), a PL/pgSQL call making the availability dashboard query non-SARGable (sequential scan), and Redis versioned config-snapshot keys accumulating without a TTL.
resolveApiKeyAuthOutcomereturns a discriminated union distinguishingnot_found/key_disabled/key_expired; the proxy guard's exhaustiveswitchmaps each reason to a distinct 401 and gates rate-limiter recording behind afailureKind !== \"account_state\"check. Multi-row duplicate-key handling correctly prefers active rows.buildAvailabilityFinalizedConditioninlines the PL/pgSQL predicate as explicit SQL sostatus_code IS NOT NULLremains SARGable; JSONB array operations are wrapped inCASEto guard against non-array rows.Confidence Score: 4/5
The auth and availability fixes are solid and well-tested; the public-status TTL fix risks silently darkening the status page on idle deployments.
The auth refactor and availability index fix are correct and comprehensively tested. The public-status TTL logic introduces a scenario where an idle deployment with no config change for 30 days loses its versioned snapshot while the pointer key survives, causing the read path to return null with no fallback — a regression from the pre-PR state where no data ever expired.
src/lib/public-status/config-snapshot.ts — the TTL applied to versioned snapshot keys needs to be substantially longer, or the read path needs a fallback when a versioned key has expired.
Comments Outside Diff (1)
src/lib/public-status/config-snapshot.ts, line 234-241 (link)The pointer keys correctly have no TTL, but each pointer stores a reference to a versioned snapshot key that carries the 30-day TTL. In
readCurrentPublicStatusConfigSnapshot, oncecurrentVersionis non-null (the version pointer key has no TTL so it survives indefinitely), the function returns immediately from reading the versioned snapshot — with no fallback if that key has expired. On any deployment that goes 30 days without a config change, the pointer is alive but the snapshot it references has silently expired, causing the status page to returnnulland go dark.The legacy pointer path has the same problem:
buildPublicStatusConfigSnapshotKey()(no TTL) stores a reference to the versioned key path, and that versioned key also expires after 30 days.A straightforward fix is to use a much longer TTL (e.g., one year). Config accumulation remains bounded while idle deployments do not silently lose snapshot data.
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (5): Last reviewed commit: "fix(public-status): drop TTL from curren..." | Re-trigger Greptile