Skip to content

fix: disabled-key 429 lockout, slow availability page, public-status Redis leak#1187

Merged
ding113 merged 6 commits into
devfrom
fix/provider-availability-and-auth-errors
May 14, 2026
Merged

fix: disabled-key 429 lockout, slow availability page, public-status Redis leak#1187
ding113 merged 6 commits into
devfrom
fix/provider-availability-and-auth-errors

Conversation

@ding113
Copy link
Copy Markdown
Owner

@ding113 ding113 commented May 14, 2026

Summary

Fixes three user-reported bugs:

  • Disabled API keys returned HTTP 429 "Too many authentication failures" instead of "key disabled". Every request with a disabled/expired key tripped the pre-auth brute-force rate limiter; after 20 attempts the IP/key got locked out and the real 401 was masked forever.
  • Provider availability dashboard was very slow. Commit 095c607e (fix(metrics): align success-rate outcome semantics #1080) replaced the SARGable status_code IS NOT NULL finalized predicate with a fn_is_message_request_finalized(...) PL/pgSQL call. PostgreSQL never inlines PL/pgSQL, so the partial index idx_message_request_provider_created_at_finalized_active was no longer usable and the query degraded to a sequential scan.
  • Public status page Redis keys leaked indefinitely. Commit b2497a46 (feat: add redis-projected public status page #1056) added Redis-backed config snapshots written via bare redis.set(...) (no TTL). Every config-version mint (provider / group / system settings change) created a new versioned snapshot key that never expired — public-status:v1:config:* and the internal variant grew without bound on busy operators. Neighbouring projection writers in rebuild-worker.ts were already using a 30-day TTL; only the config publishing path was missed.

Changes

Auth chain (fix(auth))

  • New resolveApiKeyAuthOutcome in src/repository/key.ts returning a discriminated union (not_found / key_disabled / key_expired / ok). validateApiKeyAndGetUser is kept as a back-compat wrapper. Duplicate-row handling: keys.key has no unique constraint; the classifier prefers an active duplicate, falls back to key_expired when any duplicate is enabled, else key_disabled.
  • ProxyAuthenticator (proxy auth guard) maps each reason to a distinct 401 (invalid_api_key, key_disabled, key_expired, user_disabled, user_expired). The branch is an exhaustive switch with assertNever — adding a new ApiKeyAuthFailureReason produces a TS error until the new branch is handled.
  • New AuthFailureKind tag on AuthState; the guard only feeds credentials failures (missing/mismatched/unknown key) into the brute-force rate limiter. Admin-disabled or expired keys/users bypass it entirely. A buildAuthFailure factory forces every failure call site to tag both failureKind and errorResponse at compile time.
  • The /v1/models chain (available-models.ts) is updated the same way.

Availability dashboard (perf(availability))

  • buildAvailabilityFinalizedCondition inlines a semantically-equivalent SQL expression so status_code IS NOT NULL becomes the dominant SARGable branch and the planner can use the partial index. The PL/pgSQL function definition is unchanged (still called by the upsert trigger on the write path); a header comment marks the keep-in-sync requirement.
  • The provider-chain JSONB branch is wrapped in a CASE so jsonb_array_length cannot evaluate against a non-array historical row (PG does not guarantee AND short-circuit). FINALIZED_PROVIDER_CHAIN_REASONS is extracted as a top-level constant; a comment documents the ? JSONB operator's driver assumption.

Public status TTL (fix(public-status))

  • New PUBLIC_STATUS_CONFIG_TTL_SECONDS = 60 * 60 * 24 * 30 (matches GENERATION_PROJECTION_TTL_SECONDS in rebuild-worker.ts).
  • All four redis.set(...) writes in publishPublicStatusConfigSnapshot, publishInternalPublicStatusConfigSnapshot, and publishCurrentPublicStatusConfigPointers now use (key, value, "EX", seconds). The Lua script in the pointer publisher's eval path applies SET ... EX <ttl> atomically with the version compare so the TTL refreshes on every successful publish.

Test plan

  • New unit tests in tests/unit/proxy/auth-guard-account-state.test.ts (7 cases — disabled key, expired key, disabled user, expired user, unknown key, missing creds, plus a "25 disabled-key attempts never trip 429" regression case)
  • New unit tests in tests/unit/repository/key-resolve-outcome.test.ts (9 cases including 3 dedicated to multi-row duplicate-key scenarios)
  • Updated assertions in tests/unit/lib/availability-service.test.ts guard against the slow function-call form returning and assert the new CASE shape
  • New unit tests in tests/unit/public-status/config-snapshot.test.ts (4 cases — publishPublicStatusConfigSnapshot, publishInternalPublicStatusConfigSnapshot, and both Lua / non-Lua pointer-publish paths each assert EX + PUBLIC_STATUS_CONFIG_TTL_SECONDS)
  • Pre-push checklist green: bun run build, bun run lint, bun run typecheck, bun run test (5964 tests pass)
  • Manual: disable a key in admin UI, hit the proxy 25 times in a row, confirm every response is 401 key_disabled and no 429 lockout
  • Manual: load /dashboard/availability against a populated DB, confirm noticeable speed-up vs. dev
  • Manual: run redis-cli --scan --pattern 'public-status:v1:config:*' | xargs -I {} redis-cli TTL {} and confirm every key has a positive TTL

🤖 Generated with Claude Code

Greptile Summary

This PR fixes three distinct production bugs: a brute-force rate-limiter incorrectly recording disabled/expired API key failures (causing 429 lockouts), a PL/pgSQL call making the availability dashboard query non-SARGable (sequential scan), and Redis versioned config-snapshot keys accumulating without a TTL.

  • Auth chain: resolveApiKeyAuthOutcome returns a discriminated union distinguishing not_found / key_disabled / key_expired; the proxy guard's exhaustive switch maps each reason to a distinct 401 and gates rate-limiter recording behind a failureKind !== \"account_state\" check. Multi-row duplicate-key handling correctly prefers active rows.
  • Availability dashboard: buildAvailabilityFinalizedCondition inlines the PL/pgSQL predicate as explicit SQL so status_code IS NOT NULL remains SARGable; JSONB array operations are wrapped in CASE to guard against non-array rows.
  • Public-status TTL: Versioned config snapshot keys now carry a 30-day TTL. Pointer keys are intentionally left without TTL \u2014 but pointer keys store references to versioned keys, so if the versioned key expires after 30 idle days, the status page silently goes dark.", "fileAnalyses":[{"path": "src/repository/key.ts", "confidence": 5, "summary": "Adds resolveApiKeyAuthOutcome returning a discriminated union; widens the DB query to include disabled/expired keys and classifies them in JS with correct multi-row deduplication logic."}, {"path": "src/app/v1/_lib/proxy/auth-guard.ts", "confidence": 5, "summary": "Exhaustive switch on ApiKeyAuthFailureReason with assertNever; buildAuthFailure factory enforces failureKind on every failure path; account-state failures correctly bypass the brute-force rate-limiter."}, {"path": "src/lib/availability/availability-service.ts", "confidence": 5, "summary": "Inlines the PL/pgSQL function as a SQL expression to restore SARGability; wraps JSONB array ops in a CASE to guard against non-array rows."}, {"path": "src/lib/public-status/config-snapshot.ts", "confidence": 3, "summary": "Adds 30-day TTL to versioned snapshot keys and correctly withholds TTL from pointer keys, but pointer keys reference versioned snapshot keys that also expire \u2014 causing the status page to go dark on deployments idle for 30+ days."}, {"path": "tests/unit/proxy/auth-guard-account-state.test.ts", "confidence": 5, "summary": "New test file with 7 cases including the regression scenario (25 disabled-key attempts never trigger 429)."}, {"path": "tests/unit/repository/key-resolve-outcome.test.ts", "confidence": 5, "summary": "New test file with 9 cases including 3 multi-row duplicate-key scenarios verifying active > expired > disabled classification."}]

Confidence Score: 4/5

The auth and availability fixes are solid and well-tested; the public-status TTL fix risks silently darkening the status page on idle deployments.

The auth refactor and availability index fix are correct and comprehensively tested. The public-status TTL logic introduces a scenario where an idle deployment with no config change for 30 days loses its versioned snapshot while the pointer key survives, causing the read path to return null with no fallback — a regression from the pre-PR state where no data ever expired.

src/lib/public-status/config-snapshot.ts — the TTL applied to versioned snapshot keys needs to be substantially longer, or the read path needs a fallback when a versioned key has expired.

Comments Outside Diff (1)

  1. src/lib/public-status/config-snapshot.ts, line 234-241 (link)

    P1 Versioned snapshot expiry silences the status page on idle deployments

    The pointer keys correctly have no TTL, but each pointer stores a reference to a versioned snapshot key that carries the 30-day TTL. In readCurrentPublicStatusConfigSnapshot, once currentVersion is non-null (the version pointer key has no TTL so it survives indefinitely), the function returns immediately from reading the versioned snapshot — with no fallback if that key has expired. On any deployment that goes 30 days without a config change, the pointer is alive but the snapshot it references has silently expired, causing the status page to return null and go dark.

    The legacy pointer path has the same problem: buildPublicStatusConfigSnapshotKey() (no TTL) stores a reference to the versioned key path, and that versioned key also expires after 30 days.

    A straightforward fix is to use a much longer TTL (e.g., one year). Config accumulation remains bounded while idle deployments do not silently lose snapshot data.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/lib/public-status/config-snapshot.ts
    Line: 234-241
    
    Comment:
    **Versioned snapshot expiry silences the status page on idle deployments**
    
    The pointer keys correctly have no TTL, but each pointer stores a reference to a versioned snapshot key that carries the 30-day TTL. In `readCurrentPublicStatusConfigSnapshot`, once `currentVersion` is non-null (the version pointer key has no TTL so it survives indefinitely), the function returns immediately from reading the versioned snapshot — with no fallback if that key has expired. On any deployment that goes 30 days without a config change, the pointer is alive but the snapshot it references has silently expired, causing the status page to return `null` and go dark.
    
    The legacy pointer path has the same problem: `buildPublicStatusConfigSnapshotKey()` (no TTL) stores a reference to the versioned key path, and that versioned key also expires after 30 days.
    
    A straightforward fix is to use a much longer TTL (e.g., one year). Config accumulation remains bounded while idle deployments do not silently lose snapshot data.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/lib/public-status/config-snapshot.ts:234-241
**Versioned snapshot expiry silences the status page on idle deployments**

The pointer keys correctly have no TTL, but each pointer stores a reference to a versioned snapshot key that carries the 30-day TTL. In `readCurrentPublicStatusConfigSnapshot`, once `currentVersion` is non-null (the version pointer key has no TTL so it survives indefinitely), the function returns immediately from reading the versioned snapshot — with no fallback if that key has expired. On any deployment that goes 30 days without a config change, the pointer is alive but the snapshot it references has silently expired, causing the status page to return `null` and go dark.

The legacy pointer path has the same problem: `buildPublicStatusConfigSnapshotKey()` (no TTL) stores a reference to the versioned key path, and that versioned key also expires after 30 days.

A straightforward fix is to use a much longer TTL (e.g., one year). Config accumulation remains bounded while idle deployments do not silently lose snapshot data.

Reviews (5): Last reviewed commit: "fix(public-status): drop TTL from curren..." | Re-trigger Greptile

ding113 and others added 2 commits May 14, 2026 23:35
Disabled keys hit the proxy pre-auth rate limiter on every request,
incremented the failure counter, and locked the IP/key out with HTTP 429
"Too many authentication failures" after 20 attempts. The legitimate
"key disabled" 401 was masked forever once the lockout tripped.

Introduce `resolveApiKeyAuthOutcome` returning a discriminated union
(`not_found` / `key_disabled` / `key_expired` / ok), map each reason to
its own 401 error, and only feed `credentials` failures to the
rate-limiter — admin-disabled or expired keys/users now bypass it
entirely. `validateApiKeyAndGetUser` stays as a backwards-compatible
wrapper. The /v1/models chain is updated the same way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The availability dashboard query was slow because the WHERE clause
called the PL/pgSQL function `fn_is_message_request_finalized(...)`.
PostgreSQL never inlines PL/pgSQL functions, so the predicate became
opaque to the planner and the partial index
`idx_message_request_provider_created_at_finalized_active`
(predicate `status_code IS NOT NULL AND deleted_at IS NULL`) was no
longer usable — the dashboard fell back to a sequential scan that
re-evaluated the function per row.

Inline a semantically-equivalent SQL expression so `status_code IS NOT
NULL` becomes the dominant SARGable branch. The SQL function definition
is unchanged (still called by the upsert trigger on the write path); a
header comment marks the keep-in-sync requirement. Tests assert on the
inlined form to guard against the slow function-call form returning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

该 PR 引入结构化 API 密钥鉴权结果(resolveApiKeyAuthOutcome)并在代理守卫与模型端点按失败类型区分是否计入暴力破解限流;将 availability 的 finalized 判定内联为可索引的 SQL 谓词;为 public-status 快照写入引入 Redis TTL 并更新相关测试与接口提示。

Changes

API 密钥认证结果 API 与限流集成

Layer / File(s) Summary
认证结果 API 核心定义与实现
src/repository/key.ts, src/repository/index.ts
新增 ApiKeyAuthFailureReasonApiKeyAuthOutcome,实现 resolveApiKeyAuthOutcome(keyString)(Vacuum Filter → Redis → DB 链路,后置分类 enabled/expired/disabled),并让 validateApiKeyAndGetUser 调用该实现以保持兼容;在仓库导出中暴露新类型与函数。
代理守卫认证与失败分类
src/app/v1/_lib/proxy/session.ts, src/app/v1/_lib/proxy/auth-guard.ts
新增 AuthFailureKind 类型及 AuthState.failureKind 字段;auth-guard 使用 resolveApiKeyAuthOutcome 并按 outcome.reason 设置 failureKindcredentials / account_state);ensure 仅在 failureKind !== "account_state" 时调用 proxyAuthPolicy.recordFailure,避免账户状态类失败被计入限流。
模型端点认证集成
src/app/v1/_lib/models/available-models.ts
authenticateRequest 改为调用 resolveApiKeyAuthOutcome,按 outcome.reason 映射到不同 401 错误类型与错误信息(使用本地化消息),成功时解构 userkey 供后续校验。
认证结果 API 单元测试
tests/unit/repository/key-resolve-outcome.test.ts, tests/unit/proxy/auth-guard-account-state.test.ts
新增 key-resolve-outcome.test.ts 覆盖 resolveApiKeyAuthOutcome 的各分支与重复行分类;新增 auth-guard-account-state.test.ts 覆盖账户状态失败、未知 key、缺失凭证、不触发限流等场景。
已有测试适配新认证 API
tests/unit/proxy/auth-guard-precheck.test.ts, tests/unit/models/available-models-gemini-key.test.ts, tests/unit/proxy/available-models.test.ts
更新测试中的 @/repository/key mocks,从 validateApiKeyAndGetUser 切换为 resolveApiKeyAuthOutcome,并适配其 { ok, reason } / { ok: true, user, key } 结构;为本地化错误消息在测试中注入 next-intl / getErrorMessageServer 的 mock。

数据库查询优化

Layer / File(s) Summary
终态判定条件内联实现
src/lib/availability/availability-service.ts
新增 FINALIZED_PROVIDER_CHAIN_REASONSFINALIZED_PROVIDER_CHAIN_REASONS_SQL,并将 buildAvailabilityFinalizedCondition() 从调用 fn_is_message_request_finalized(...) 改为内联等价的 SQL 谓词(statusCode / blockedBy / errorMessage / providerChain 最后元素的保护性 CASE 逻辑),以保留索引可用性并保持与 drizzle SQL 的语义等价。
终态条件相关测试适配
tests/unit/lib/availability-service.test.ts
更新测试断言,验证生成的 SQL 不包含 fn_is_message_request_finalized,且包含内联判定片段("status_code" is not null"blocked_by" is not null"provider_chain" -> -1 ->> 'reason'、以及 jsonb 类型保护的 CASE 逻辑)。

Public-status Redis TTL

Layer / File(s) Summary
引入 TTL 与修改写入逻辑
src/lib/public-status/config-snapshot.ts
新增 PUBLIC_STATUS_CONFIG_TTL_SECONDS 常量,扩展 RedisWriter.set 签名并在 publishPublicStatusConfigSnapshot / publishInternalPublicStatusConfigSnapshot / publishCurrentPublicStatusConfigPointers 的写入中加入 EX TTL(Lua eval 分支也将 TTL 作为 ARGV 传入)。
测试适配
tests/unit/public-status/config-snapshot.test.ts
扩展测试模块接口以包含 PUBLIC_STATUS_CONFIG_TTL_SECONDS,更新 redis.set mock 合同为可变参数形式,并新增断言验证所有相关 Redis 写入在版本化 snapshot key 写入时携带 EX 与 TTL,而 pointer key 写入不带 TTL(包含 eval 路径断言)。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ding113/claude-code-hub#1040: Both PRs modify src/app/v1/_lib/proxy/auth-guard.ts inside ProxyAuthenticator.ensure/validate—changing how candidate API keys and auth outcomes drive failure/success recording and failure-kind/rate-limit behavior.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰明确地总结了PR的三个主要修复:disabled-key 429锁定、availability页面性能问题和public-status Redis泄漏。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR 描述与变更集高度相关,清晰阐述了三个用户报告的 bug 修复、关键改动和测试方案。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/provider-availability-and-auth-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5d7449d87

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/repository/key.ts
Comment on lines 688 to +690
.from(keys)
.innerJoin(users, eq(keys.userId, users.id))
.where(
and(
eq(keys.key, keyString),
isNull(keys.deletedAt),
eq(keys.isEnabled, true),
or(isNull(keys.expiresAt), gt(keys.expiresAt, new Date())),
isNull(users.deletedAt)
)
);
.where(and(eq(keys.key, keyString), isNull(keys.deletedAt), isNull(users.deletedAt)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make auth lookup deterministic when duplicate key rows exist

Because this query now returns all non-deleted rows for a key string and later uses result[0], authentication outcome depends on arbitrary row order when duplicate keys.key values exist (the schema has no unique constraint on keys.key). A disabled/expired duplicate can be picked before an active duplicate, causing valid credentials to be rejected as key_disabled/key_expired. Before this change, the SQL filter only returned active rows, so this mixed-state false rejection path was not present.

Useful? React with 👍 / 👎.

@github-actions github-actions Bot added the size/L Large PR (< 1000 lines) label May 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the API key authentication logic by introducing resolveApiKeyAuthOutcome, which distinguishes between missing, disabled, and expired keys. This change allows the ProxyAuthenticator to skip rate-limiting for administrative account-state failures, preventing potential lockouts. Additionally, the availability-service.ts was optimized by inlining SQL logic to improve query planning and index utilization. Feedback suggests extracting hardcoded status reasons into a shared constant to mitigate maintenance risks.

Comment on lines 83 to 110
function buildAvailabilityFinalizedCondition() {
return sql`fn_is_message_request_finalized(
${messageRequest.blockedBy},
${messageRequest.statusCode},
${messageRequest.providerChain},
${messageRequest.errorMessage}
// The `IS NOT NULL` checks below are individually SARGable. Listing
// status_code first encourages the planner to scan the partial index.
return sql`(
${messageRequest.statusCode} IS NOT NULL
OR ${messageRequest.blockedBy} IS NOT NULL
OR COALESCE(${messageRequest.errorMessage}, '') <> ''
OR (
${messageRequest.providerChain} IS NOT NULL
AND jsonb_typeof(${messageRequest.providerChain}) = 'array'
AND jsonb_array_length(${messageRequest.providerChain}) > 0
AND jsonb_typeof(${messageRequest.providerChain} -> -1) = 'object'
AND (
(${messageRequest.providerChain} -> -1 ->> 'reason') IN (
'request_success', 'retry_success', 'retry_failed', 'system_error',
'resource_not_found', 'client_error_non_retryable',
'concurrent_limit_failed', 'hedge_winner', 'hedge_loser_cancelled',
'client_abort'
)
OR (
(${messageRequest.providerChain} -> -1 ? 'statusCode')
AND jsonb_typeof(${messageRequest.providerChain} -> -1 -> 'statusCode') = 'number'
)
OR COALESCE(${messageRequest.providerChain} -> -1 ->> 'errorMessage', '') <> ''
)
)
)`;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inlining of the fn_is_message_request_finalized logic into this SQL template is a significant performance improvement as it allows the PostgreSQL planner to utilize partial indexes. However, this creates a maintenance risk as the logic is now duplicated between the TypeScript code and the PL/pgSQL function in the database.

Consider extracting the list of finalized reasons (lines 97-100) into a shared constant at the top of the file to improve readability and ensure consistency if these reasons are used elsewhere in the service.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/v1/_lib/models/available-models.ts`:
- Around line 66-67: The thrown error responses in available-models.ts (the
throw c.json calls that return messages like "API 密钥已被禁用" and the other messages
around the same block at lines ~69-73) must stop returning hardcoded user-facing
strings and instead return i18n keys; update the code that throws these errors
(the throw c.json(...) statements) to use defined translation keys (e.g.,
models.key_disabled, models.invalid_key, etc.) and ensure the response payload
carries the key (and optional interpolation data) so the central translation
layer will render the localized text for zh-CN/zh-TW/en/ja/ru before sending to
the client; keep the error structure ({ error: { messageKey: "...", type: "..."
} } or follow the project’s established i18n error contract) and replace all
hardcoded messages in that block (including the ones referenced at 69-73)
accordingly.

In `@src/app/v1/_lib/proxy/auth-guard.ts`:
- Around line 188-191: The 401 error message passed to ProxyResponses.buildError
is hardcoded; replace literal user-facing strings in auth-guard.ts (the
ProxyResponses.buildError calls for the invalid API key / disabled / expired
branches) with i18n keys (e.g. use keys for "invalid_api_key", "key_disabled",
"key_expired") and pass those keys into the response payload so the translation
layer renders the localized text; update the three affected calls (the
buildError invocations around the invalid_api_key, key_disabled, and key_expired
branches) and ensure corresponding entries are added to the i18n resources for
zh-CN, zh-TW, en, ja, and ru.

In `@src/lib/availability/availability-service.ts`:
- Around line 91-106: The SQL predicate using ${messageRequest.providerChain}
relies on short-circuiting and can still call jsonb_array_length/jsonb_typeof on
non-array values causing query errors; change the WHERE fragment to use a CASE
(or a NULLIF/WHEN) guard that first checks
jsonb_typeof(${messageRequest.providerChain}) = 'array' and only then evaluates
jsonb_array_length(...) and the -> -1 expressions so evaluation order is forced,
and add a regression test feeding a non-array provider_chain (object, scalar,
and JSON null) to the code path that builds this query to assert it no longer
throws a 500.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 979c59c4-0c49-4384-bf62-b28339970251

📥 Commits

Reviewing files that changed from the base of the PR and between 724f070 and b5d7449.

📒 Files selected for processing (12)
  • src/app/v1/_lib/models/available-models.ts
  • src/app/v1/_lib/proxy/auth-guard.ts
  • src/app/v1/_lib/proxy/session.ts
  • src/lib/availability/availability-service.ts
  • src/repository/index.ts
  • src/repository/key.ts
  • tests/unit/lib/availability-service.test.ts
  • tests/unit/models/available-models-gemini-key.test.ts
  • tests/unit/proxy/auth-guard-account-state.test.ts
  • tests/unit/proxy/auth-guard-precheck.test.ts
  • tests/unit/proxy/available-models.test.ts
  • tests/unit/repository/key-resolve-outcome.test.ts

Comment thread src/app/v1/_lib/models/available-models.ts Outdated
Comment thread src/app/v1/_lib/proxy/auth-guard.ts Outdated
Comment thread src/lib/availability/availability-service.ts Outdated
Comment on lines 47 to 58
export interface AuthState {
user: User | null;
key: Key | null;
apiKey: string | null;
success: boolean;
errorResponse?: Response; // 认证失败时的详细错误响应
/**
* Set when `success` is false. Determines whether the proxy auth guard
* records the failure against the IP/key rate-limiter.
*/
failureKind?: AuthFailureKind;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 failureKind should be required on failure states

AuthState uses a flat success: boolean shape, so TypeScript does not enforce that failureKind is always present when success is false. The guard check authState.failureKind !== "account_state" silently treats undefined as a credentials failure — the safe default, but only by accident. Any future failure branch that forgets the field will record against the rate-limiter without any compile-time warning. Modeling this as a discriminated union would make the constraint explicit and prevent the footgun entirely.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/session.ts
Line: 47-58

Comment:
**`failureKind` should be required on failure states**

`AuthState` uses a flat `success: boolean` shape, so TypeScript does not enforce that `failureKind` is always present when `success` is `false`. The guard check `authState.failureKind !== "account_state"` silently treats `undefined` as a `credentials` failure — the safe default, but only by accident. Any future failure branch that forgets the field will record against the rate-limiter without any compile-time warning. Modeling this as a discriminated union would make the constraint explicit and prevent the footgun entirely.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/app/v1/_lib/proxy/auth-guard.ts Outdated
Comment on lines 214 to 229
// outcome.reason === "key_expired"
logger.warn("[ProxyAuthenticator] API key has expired", {
apiKeyLength: apiKey.length,
fromHeader: !!headers.authHeader || !!headers.apiKeyHeader || !!headers.geminiApiKeyHeader,
fromQuery: !!headers.geminiApiKeyQuery,
});
return {
user: null,
key: null,
apiKey,
success: false,
failureKind: "account_state",
errorResponse: ProxyResponses.buildError(
401,
"API 密钥无效。提供的密钥不存在、已被删除、已被禁用或已过期。",
"invalid_api_key"
"API 密钥已过期。请联系管理员续期或更换密钥。",
"key_expired"
),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-exhaustive fallthrough on outcome.reason

The comment // outcome.reason === "key_expired" is correct today because the union has exactly three members, but there is no compile-time exhaustiveness check. If a new ApiKeyAuthFailureReason is ever added (e.g., "key_revoked"), the new variant silently falls into the key_expired handler, sends the wrong error message, and — critically — classifies the failure as account_state when it might need to be credentials. A switch with an assertNever exhaustiveness guard, or an explicit else branch that asserts the type, would surface this at build time instead.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/auth-guard.ts
Line: 214-229

Comment:
**Non-exhaustive fallthrough on `outcome.reason`**

The comment `// outcome.reason === "key_expired"` is correct today because the union has exactly three members, but there is no compile-time exhaustiveness check. If a new `ApiKeyAuthFailureReason` is ever added (e.g., `"key_revoked"`), the new variant silently falls into the `key_expired` handler, sends the wrong error message, and — critically — classifies the failure as `account_state` when it might need to be `credentials`. A `switch` with an `assertNever` exhaustiveness guard, or an explicit `else` branch that asserts the type, would surface this at build time instead.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +102 to +105
OR (
(${messageRequest.providerChain} -> -1 ? 'statusCode')
AND jsonb_typeof(${messageRequest.providerChain} -> -1 -> 'statusCode') = 'number'
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Raw ? JSONB operator in Drizzle sql template

(${messageRequest.providerChain} -> -1 ? 'statusCode') uses the PostgreSQL JSONB key-existence operator ?. With the pg driver (which parameterizes via $N) the raw ? passes through to the server unchanged and works correctly. However, some connection-pooling middlewares or alternative drivers (e.g., postgres.js) treat bare ? as a positional-parameter placeholder and will mangle the query. Consider adding a brief comment documenting the driver assumption here, in case the underlying driver is ever swapped.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/availability/availability-service.ts
Line: 102-105

Comment:
**Raw `?` JSONB operator in Drizzle `sql` template**

`(${messageRequest.providerChain} -> -1 ? 'statusCode')` uses the PostgreSQL JSONB key-existence operator `?`. With the `pg` driver (which parameterizes via `$N`) the raw `?` passes through to the server unchanged and works correctly. However, some connection-pooling middlewares or alternative drivers (e.g., `postgres.js`) treat bare `?` as a positional-parameter placeholder and will mangle the query. Consider adding a brief comment documenting the driver assumption here, in case the underlying driver is ever swapped.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR fixes two distinct production issues: a 429 lockout caused by disabled API keys and a slow availability dashboard due to a non-inlinable PL/pgSQL predicate. Both fixes are well-reasoned, thoroughly tested, and the auth discriminated-union refactor correctly isolates brute-force signals from admin actions.

PR Size: L

  • Lines changed: 744 (685 additions, 59 deletions)
  • Files changed: 12
  • Split suggestion: This PR contains two independent fixes that could ship separately: (1) the auth-chain rate-limiter routing (7 files, ~300 lines), and (2) the availability predicate inlining (2 files, ~100 lines). Splitting would make rollback safer if either fix causes unexpected issues.

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 1 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 1 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

High Priority Issues (Should Fix)

1. Non-exhaustive ApiKeyAuthFailureReason handling in auth-guard.ts

File: src/app/v1/_lib/proxy/auth-guard.ts:214
Confidence: 95/100

The sequential if statements handling outcome.reason lack a compile-time exhaustiveness guard. If ApiKeyAuthFailureReason is ever expanded (e.g., adding "key_revoked"), the new variant silently falls through to the key_expired handler. This misclassifies the failure as account_state (bypassing the rate limiter) and returns the wrong error message. The same pattern exists in available-models.ts (line 71) but with less severe consequences since it falls through to invalid_api_key (credentials class).

Suggested fix: Convert to an if/else if chain with a never exhaustiveness guard:

if (\!outcome.ok) {
  if (outcome.reason === "not_found") {
    // ... existing handler
  } else if (outcome.reason === "key_disabled") {
    // ... existing handler
  } else if (outcome.reason === "key_expired") {
    // ... existing handler
  } else {
    const _exhaustiveCheck: never = outcome.reason;
    throw new Error(`Unhandled auth failure reason: ${_exhaustiveCheck}`);
  }
}

Review Coverage

  • Logic and correctness - One exhaustiveness fragility identified
  • Security (OWASP Top 10) - Clean; auth failure classification is sound
  • Error handling - Clean; all failure paths log and return precise errors
  • Type safety - One type-union exhaustiveness gap in auth-guard.ts
  • Documentation accuracy - Clean; excellent comments on inlined SQL predicate
  • Test coverage - Adequate; 13 new unit tests cover all auth outcomes + regression
  • Code clarity - Good; discriminated union refactor improves readability

Automated review by Claude AI

Comment thread src/app/v1/_lib/proxy/auth-guard.ts Outdated

if (!authResult) {
logger.debug("[ProxyAuthenticator] API key validation failed", {
// outcome.reason === "key_expired"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[High] [LOGIC-BUG] Non-exhaustive ApiKeyAuthFailureReason handling

Why this is a problem: The sequential if statements for outcome.reason lack a compile-time exhaustiveness guard. If ApiKeyAuthFailureReason is ever expanded (e.g., adding "key_revoked"), the new variant silently falls through to the key_expired handler below. This returns the wrong error message and misclassifies the failure as account_state, bypassing the brute-force rate limiter.

Suggested fix: Convert to if/else if with a never exhaustiveness guard in the final else branch.

- key.ts: classify duplicate-row matches deterministically. The relaxed
  WHERE clause can return multiple non-deleted rows for one key string
  (no unique constraint on keys.key) and result[0] was non-deterministic;
  prefer an active row, fall back to "any enabled = key_expired", else
  key_disabled. (chatgpt-codex P1)

- availability-service.ts: wrap the provider_chain jsonb branch in a
  CASE expression so jsonb_array_length cannot run on a non-array row.
  PostgreSQL does not guarantee AND short-circuit, so a single non-array
  historical row would otherwise crash the dashboard query. Extract the
  finalized provider_chain reason list into FINALIZED_PROVIDER_CHAIN_REASONS
  and document the JSONB `?` operator's driver assumption. (coderabbit P2,
  gemini, greptile)

- auth-guard.ts / available-models.ts: convert the outcome.reason branch
  to an exhaustive switch with assertNever, and introduce a buildAuthFailure
  factory so every failure path is forced to tag its failureKind at compile
  time. Adding a new ApiKeyAuthFailureReason now produces a TypeScript
  error until the new branch is handled. (greptile P2 ×2)

- Tests cover the duplicate-row cases (ok / key_expired / key_disabled
  across mixed-state rows) and assert the CASE guard appears in the
  generated SQL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ding113
Copy link
Copy Markdown
Owner Author

ding113 commented May 14, 2026

Addressed review feedback in d314561:

Bugs fixed

  • chatgpt-codex P1 (key.ts): the relaxed WHERE clause could return multiple non-deleted rows for the same key string (no unique constraint on keys.key); result[0] was non-deterministic. Now we prefer an active duplicate, fall back to key_expired when any row is still enabled, and only return key_disabled if every duplicate is disabled. Three new regression tests cover mixed-state row sets.
  • coderabbit P2 (availability-service.ts): wrapped the provider_chain JSONB branch in a CASE so jsonb_array_length cannot evaluate against a non-array historical row (PG does not guarantee AND short-circuit). Test asserts the CASE shape now appears in the generated SQL.
  • greptile P2 (auth-guard.ts exhaustiveness): the outcome.reason branch is now an exhaustive switch with assertNever. Adding a new ApiKeyAuthFailureReason will fail to compile until the new branch is handled. Same pattern applied in available-models.ts.
  • greptile P2 (session.ts failureKind footgun): introduced a buildAuthFailure factory so every failure return is forced (by the function signature) to tag both failureKind and errorResponse. Less invasive than refactoring AuthState to a discriminated union (which would require null-narrowing edits at every consumer).

Nice-to-have

  • gemini-code-assist: extracted FINALIZED_PROVIDER_CHAIN_REASONS constant.
  • greptile (JSONB ? operator): added a comment documenting the pg driver assumption.

Deferred (with reason)

  • coderabbit (i18n on available-models.ts and auth-guard.ts): the pre-existing 401 error messages in these files (e.g. 用户账户已被禁用) were already hardcoded Chinese strings before this PR. My change preserved the file's existing pattern. Introducing i18n for proxy error JSON payloads is a cross-cutting refactor that should land separately rather than be half-applied to a handful of new branches.

Local pre-push: build ✓, typecheck ✓, lint ✓, all 5960 unit tests ✓.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

const outcome = await resolveApiKeyAuthOutcome(apiKey);
if (!outcome.ok) {
// Exhaustive switch: see auth-guard.ts for rationale. Adding a new
// ApiKeyAuthFailureReason will produce a TypeScript error on the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [TEST-MISSING-CRITICAL] /v1/models key_disabled / key_expired branches have no unit coverage

Why this is a problem: This PR adds new auth behavior in authenticateRequest (distinct 401 key_disabled vs key_expired). There is currently no unit test that asserts these new branches, so a regression back to a generic invalid_api_key (or other behavior) would go undetected. Guideline: 2. **Test Coverage** - All new features must have unit test coverage of at least 80%.

Suggested fix:

// tests/unit/models/available-models-auth-outcome.test.ts
import { describe, expect, it, vi } from "vitest";

vi.mock("@/repository/key", () => ({
  resolveApiKeyAuthOutcome: vi.fn(),
}));

function makeCtx(apiKey: string) {
  return {
    req: {
      path: "/v1/models",
      header: (name: string) => (name.toLowerCase() === "x-api-key" ? apiKey : undefined),
      query: () => undefined,
    },
    json: (body: unknown, status?: number) =>
      new Response(JSON.stringify(body), {
        status: status ?? 200,
        headers: { "content-type": "application/json" },
      }),
  } as any;
}

describe("handleAvailableModels auth outcomes", () => {
  it("returns 401 key_disabled", async () => {
    const { resolveApiKeyAuthOutcome } = await import("@/repository/key");
    vi.mocked(resolveApiKeyAuthOutcome).mockResolvedValueOnce({ ok: false, reason: "key_disabled" });

    const { handleAvailableModels } = await import("@/app/v1/_lib/models/available-models");
    const response = await handleAvailableModels(makeCtx("sk-disabled"));

    expect(response.status).toBe(401);
    const payload = (await response.json()) as { error: { type: string } };
    expect(payload.error.type).toBe("key_disabled");
  });

  it("returns 401 key_expired", async () => {
    const { resolveApiKeyAuthOutcome } = await import("@/repository/key");
    vi.mocked(resolveApiKeyAuthOutcome).mockResolvedValueOnce({ ok: false, reason: "key_expired" });

    const { handleAvailableModels } = await import("@/app/v1/_lib/models/available-models");
    const response = await handleAvailableModels(makeCtx("sk-expired"));

    expect(response.status).toBe(401);
    const payload = (await response.json()) as { error: { type: string } };
    expect(payload.error.type).toBe("key_expired");
  });
});

401,
"提供了多个冲突的 API 密钥。请仅使用一种认证方式。",
"authentication_error"
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] [STANDARD-VIOLATION] New auth error messages hardcode user-facing text (bypasses i18n)

Why this is a problem: Guideline: 3. **i18n Required** - All user-facing strings must use i18n (5 languages supported). Never hardcode display text. The new responses embed hardcoded message strings (e.g. "API 密钥无效。提供的密钥不存在或已被删除。"), which cannot be localized and will leak a single-language UX into non-zh locales.

Suggested fix:

// src/app/v1/_lib/proxy/auth-guard.ts
const { getLocale, getTranslations } = await import("next-intl/server");
const locale = await getLocale();
const t = await getTranslations({ locale, namespace: "auth" });

return {
  user: null,
  key: null,
  apiKey,
  success: false,
  failureKind: "credentials",
  errorResponse: ProxyResponses.buildError(
    401,
    t("errors.apiKeyNotFoundOrDeleted"),
    "invalid_api_key"
  ),
};

Add the new keys to messages/{locale}/auth.json under errors (e.g. apiKeyNotFoundOrDeleted, apiKeyDisabled, apiKeyExpired) so all supported locales are covered.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR combines two substantial changes (proxy auth error classification and availability dashboard query perf) into a single large review surface.
Two issues need follow-up: the new /v1/models auth outcome branches are not unit-tested, and the new auth error responses add additional hardcoded user-facing strings despite the repo i18n requirement.

PR Size: L

  • Lines changed: 744
  • Files changed: 12
  • Split suggestion: separate the auth-chain changes (key lookup outcome + proxy auth guard + tests) from the availability dashboard perf changes (availability finalized predicate + tests).

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 1 0
Tests 0 1 0 0
Simplification 0 0 0 0

Critical Issues (Must Fix)

  • None.

High Priority Issues (Should Fix)

  • [HIGH] [TEST-MISSING-CRITICAL] Missing unit coverage for new /v1/models key_disabled / key_expired branches (src/app/v1/_lib/models/available-models.ts:66). Confidence: 90.
  • [MEDIUM] [STANDARD-VIOLATION] New auth error messages hardcode user-facing strings instead of i18n (src/app/v1/_lib/proxy/auth-guard.ts:190). Confidence: 80.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Codex AI

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Identified PR #1187 and applied label size/L (744 lines changed across 12 files).
  • Posted 2 inline review comments on diff lines:
    • src/app/v1/_lib/models/available-models.ts:66[HIGH] [TEST-MISSING-CRITICAL] missing unit coverage for new key_disabled / key_expired branches.
    • src/app/v1/_lib/proxy/auth-guard.ts:190[MEDIUM] [STANDARD-VIOLATION] new hardcoded user-facing auth error messages bypass i18n requirement.
  • Submitted the required PR review summary via gh pr review --comment (includes PR size, split suggestion, issue table, and confidence notes).

publishPublicStatusConfigSnapshot, publishInternalPublicStatusConfigSnapshot,
and publishCurrentPublicStatusConfigPointers all wrote keys with bare
redis.set(...) and no TTL. Every config-version mint (provider/group/system
settings change) created a new versioned snapshot key that never expired —
on a busy operator the public-status:v1:config:* and config-internal:* key
namespaces grow without bound. Neighbouring projection writers in
rebuild-worker.ts already use a 30-day TTL via setWithTtl; only the config
publishing path was missed when that pattern was introduced (#1056).

Add PUBLIC_STATUS_CONFIG_TTL_SECONDS (30 days, matching
GENERATION_PROJECTION_TTL_SECONDS in rebuild-worker.ts), widen the local
RedisWriter type to the (key, value, "EX", seconds) ioredis overload, and
apply the TTL to all four call sites — including the Lua script used by
the pointer publisher so SET ... EX is atomic with the version compare.
Each successful publish refreshes the TTL on the live pointer keys, so as
long as configs are published at least every 30 days the active pointer
never expires while stale versioned snapshots get cleaned up naturally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ding113 ding113 changed the title fix: disabled-key 429 lockout + slow provider availability page fix: disabled-key 429 lockout, slow availability page, public-status Redis leak May 14, 2026
@ding113
Copy link
Copy Markdown
Owner Author

ding113 commented May 14, 2026

Pushed 26d3b9f0 covering a third user-reported bug — the public-status config snapshots accumulated forever in Redis because the four writes in src/lib/public-status/config-snapshot.ts used bare redis.set(...) without TTL.

Root cause traced to #1056: rebuild-worker.ts correctly used setWithTtl(...) for manifest / series / snapshot projections, but the config publishing path was missed. Fix adds PUBLIC_STATUS_CONFIG_TTL_SECONDS = 30d (matching GENERATION_PROJECTION_TTL_SECONDS), widens the local RedisWriter to accept the (key, value, "EX", seconds) ioredis overload, and applies the TTL to all four sites — including the Lua pointer-update script so SET ... EX is atomic with the version compare. Each successful publish refreshes the TTL on the live pointer keys, so as long as configs publish at least every 30 days the active pointer stays warm while stale versioned snapshots expire naturally.

Bundling this into the existing PR since all three bugs come from the same user report batch and are independent file-scoped fixes. Happy to split into a separate PR if preferred.

PR title and body updated accordingly. Pre-push re-run: build ✓, typecheck ✓, lint ✓, 5964 tests ✓.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

…tcomes

CodeRabbit and CI flagged two remaining gaps in the previous fix:

- The new 401 messages for invalid_api_key / key_disabled / key_expired
  were hardcoded Chinese strings, ignoring the project's i18n guideline
  (5 locales: zh-CN, zh-TW, en, ja, ru). The pre-existing strings in the
  same files (e.g. user_disabled, user_expired) were already hardcoded
  before this PR and remain so — see PR scope note — but the new
  branches should follow the established pattern.
- /v1/models had no unit coverage for the new key_disabled / key_expired
  branches in handleAvailableModels.authenticateRequest. A regression
  back to a generic invalid_api_key would have gone undetected.

Add PROXY_INVALID_API_KEY, PROXY_API_KEY_DISABLED, PROXY_API_KEY_EXPIRED
codes to AUTH_ERRORS plus translations across all 5 locales. Wire
auth-guard.ts and available-models.ts through getErrorMessageServer +
next-intl/server's getLocale — same pattern the neighbouring
rate-limit-guard.ts already uses.

Add tests/unit/models/available-models-auth-outcome.test.ts covering
all five 401 branches (key_disabled, key_expired, not_found,
user_disabled, user_expired). Existing auth-guard tests mock
next-intl/server + getErrorMessageServer so the unit tests can run
outside a Next.js request context, and message assertions pin on the
ERROR_CODES key (the localized text lives in messages/<locale>/errors.json).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ding113
Copy link
Copy Markdown
Owner Author

ding113 commented May 14, 2026

Pushed c983b640 addressing the second-round review findings:

[HIGH] TEST-MISSING-CRITICAL — /v1/models lacks key_disabled / key_expired coverage
Added tests/unit/models/available-models-auth-outcome.test.ts covering all five 401 branches in handleAvailableModels.authenticateRequest: key_disabled, key_expired, not_found, user_disabled, user_expired. A regression back to a generic invalid_api_key would now be caught.

[MEDIUM] STANDARD-VIOLATION — Hardcoded auth error messages
Reversing my earlier defer: the codebase has an established i18n pattern (ERROR_CODES + getErrorMessageServer + next-intl/server's getLocale) and the neighbouring rate-limit-guard.ts already uses it. Added 3 new codes (PROXY_INVALID_API_KEY, PROXY_API_KEY_DISABLED, PROXY_API_KEY_EXPIRED) under AUTH_ERRORS, translations across all 5 locales (zh-CN, zh-TW, en, ja, ru), and wired both auth-guard.ts and available-models.ts through getErrorMessageServer. Scope note: the pre-existing strings in the same files (e.g. user_disabled, user_expired) are still hardcoded — they were that way before this PR; converting them is a separate small refactor that doesn't belong in this bug-fix branch.

The auth-guard test suite mocks next-intl/server + getErrorMessageServer so unit tests can run outside a Next.js request context; message assertions pin on the ERROR_CODES key (the localized text lives in messages/<locale>/errors.json).

Pre-push: build ✓, typecheck ✓, lint ✓, 5969 unit tests ✓.

Comment on lines 82 to 90
}

interface RedisWriter {
// ioredis supports both bare `set(key, value)` and the EX-variant
// `set(key, value, "EX", seconds)`. Widening the type here lets us pass
// an explicit TTL on every write — see PUBLIC_STATUS_CONFIG_TTL_SECONDS.
set(key: string, value: string, mode: "EX", seconds: number): Promise<unknown> | unknown;
set(key: string, value: string): Promise<unknown> | unknown;
get?(key: string): Promise<string | null> | string | null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Current-pointer keys will expire on idle deployments

buildPublicStatusConfigSnapshotKey(), buildPublicStatusInternalConfigSnapshotKey(), and buildPublicStatusConfigVersionPointerKey() are current pointer keys — they always point at the latest config and should persist until explicitly overwritten. Applying the same 30-day TTL to them as to the accumulating versioned snapshot keys means that any deployment that goes 30 days without a config change loses its pointer keys, causing readCurrentPublicStatusConfigSnapshot to return null and the public status page to silently show no config data.

The versioned snapshot keys (public-status:v1:config:<version>) are the ones that accumulate and need a TTL; the three pointer keys should either carry no TTL or a much longer one (e.g., 1 year), since they are overwritten atomically on every config publish and never accumulate.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/public-status/config-snapshot.ts
Line: 82-90

Comment:
**Current-pointer keys will expire on idle deployments**

`buildPublicStatusConfigSnapshotKey()`, `buildPublicStatusInternalConfigSnapshotKey()`, and `buildPublicStatusConfigVersionPointerKey()` are *current* pointer keys — they always point at the latest config and should persist until explicitly overwritten. Applying the same 30-day TTL to them as to the accumulating versioned snapshot keys means that any deployment that goes 30 days without a config change loses its pointer keys, causing `readCurrentPublicStatusConfigSnapshot` to return `null` and the public status page to silently show no config data.

The versioned snapshot keys (`public-status:v1:config:<version>`) are the ones that accumulate and need a TTL; the three pointer keys should either carry no TTL or a much longer one (e.g., 1 year), since they are overwritten atomically on every config publish and never accumulate.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c983b6402a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +230 to +233
buildPublicStatusConfigSnapshotKey(),
JSON.stringify({ key, configVersion: snapshot.configVersion })
JSON.stringify({ key, configVersion: snapshot.configVersion }),
"EX",
PUBLIC_STATUS_CONFIG_TTL_SECONDS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep public-status current pointers from expiring

Applying EX PUBLIC_STATUS_CONFIG_TTL_SECONDS to the config:current pointer means a stable deployment with no config publishes for 30 days will lose its only pointer to the latest snapshot, after which readCurrentPublicStatusConfigSnapshot falls through to null and endpoints like /api/public-site-meta report projection_missing. The leak fix only needs TTL on versioned snapshot keys; expiring the single “current” pointer key introduces a time-based availability regression.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/models/available-models-auth-outcome.test.ts (1)

157-157: ⚡ Quick win

as never 替换为显式类型或 satisfies 以保持类型约束

Line 157 和 Line 173 中的 as never 会绕过 TypeScript 的类型检查,降低测试对 resolveApiKeyAuthOutcome 返回类型变更的捕获能力。建议改用显式类型或 satisfies ApiKeyAuthOutcome 来确保 mock 返回值始终与实际类型定义保持同步。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/models/available-models-auth-outcome.test.ts` at line 157, The
test currently uses "as never" to cast mocked return values which disables
TypeScript checks; instead update the mocks passed to resolveApiKeyAuthOutcome
to use an explicit type or the "satisfies ApiKeyAuthOutcome" operator so the
returned shape matches ApiKeyAuthOutcome—locate the mock objects around
resolveApiKeyAuthOutcome calls (previously cast with "as never") and replace
those casts with either a typed annotation or "... } satisfies
ApiKeyAuthOutcome" to restore compile-time validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/models/available-models-auth-outcome.test.ts`:
- Line 157: The test currently uses "as never" to cast mocked return values
which disables TypeScript checks; instead update the mocks passed to
resolveApiKeyAuthOutcome to use an explicit type or the "satisfies
ApiKeyAuthOutcome" operator so the returned shape matches
ApiKeyAuthOutcome—locate the mock objects around resolveApiKeyAuthOutcome calls
(previously cast with "as never") and replace those casts with either a typed
annotation or "... } satisfies ApiKeyAuthOutcome" to restore compile-time
validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9b790b2e-362d-4897-9836-7095e18cd202

📥 Commits

Reviewing files that changed from the base of the PR and between 26d3b9f and c983b64.

📒 Files selected for processing (11)
  • messages/en/errors.json
  • messages/ja/errors.json
  • messages/ru/errors.json
  • messages/zh-CN/errors.json
  • messages/zh-TW/errors.json
  • src/app/v1/_lib/models/available-models.ts
  • src/app/v1/_lib/proxy/auth-guard.ts
  • src/lib/utils/error-messages.ts
  • tests/unit/models/available-models-auth-outcome.test.ts
  • tests/unit/proxy/auth-guard-account-state.test.ts
  • tests/unit/proxy/auth-guard-precheck.test.ts
✅ Files skipped from review due to trivial changes (5)
  • messages/en/errors.json
  • messages/zh-TW/errors.json
  • messages/zh-CN/errors.json
  • messages/ja/errors.json
  • messages/ru/errors.json
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/unit/proxy/auth-guard-precheck.test.ts
  • src/app/v1/_lib/proxy/auth-guard.ts
  • src/app/v1/_lib/models/available-models.ts
  • tests/unit/proxy/auth-guard-account-state.test.ts

greptile flagged a P1: applying the same 30-day TTL to the three
"current pointer" keys would dark out any deployment that goes longer
than the TTL without publishing a new config. Pointer keys don't
accumulate — only one entry per pointer name exists, overwritten
atomically on every publish — so they MUST persist until explicitly
overwritten.

Only the versioned snapshot keys
(`public-status:v1:config:<version>` and `:config-internal:<version>`)
keep the 30-day TTL — those are the ones that accumulate as new config
versions are minted. The pointer publisher's Lua script and JS fallback
now write bare `SET` without `EX`.

Tests assert the split: the versioned write carries `EX <ttl>` and the
pointer write is a bare two-arg `set(key, value)` with no TTL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5cd75941a4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (redis) {
await redis.set(key, JSON.stringify(snapshot));
// Versioned snapshot key: TTL'd so old versions get cleaned up.
await redis.set(key, JSON.stringify(snapshot), "EX", PUBLIC_STATUS_CONFIG_TTL_SECONDS);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep latest public config snapshot from expiring

Applying EX PUBLIC_STATUS_CONFIG_TTL_SECONDS to the versioned config snapshot means the currently referenced snapshot disappears after 30 days if no config republish occurs. In the normal publish flow (publishCurrentPublicStatusConfigProjection), both publish*ConfigSnapshot calls pass setCurrentPointer: false, so reads depend on configVersion + public-status:v1:config:<version>; once that versioned key expires, readCurrentPublicStatusConfigSnapshot returns null and endpoints like /api/public-site-meta fall back to projection_missing until another rebuild/publish path runs.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

@ding113 ding113 merged commit b0c9eaf into dev May 14, 2026
12 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Claude Code Hub Roadmap May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:provider area:Rate Limit bug Something isn't working size/L Large PR (< 1000 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant