fix(proxy): release failed provider sessions by ding113 · Pull Request #1108 · ding113/claude-code-hub

ding113 · 2026-04-25T14:30:03Z

Summary

Release provider-level active session ZSET entries immediately when a provider fails and the proxy falls back. Prevents outage storms from leaving failed provider sessions resident until TTL cleanup.

Problem

Provider concurrency is tracked before forwarding in provider:{id}:active_sessions. When providers were marked as failed (exhausted retries, network errors, or invalid responses), their active session entries were not released immediately. This caused:

Redis ZSET churn - Failed sessions remained in the sorted set until TTL cleanup
Inaccurate concurrency tracking - The active session count was inflated during provider outages
Outage storm amplification - When all providers were unavailable, the accumulated failed sessions exacerbated the Redis load

Solution

Introduce a new RateLimitService.releaseProviderSession() method that removes a session from the provider's active_sessions ZSET when the provider is marked as failed. The ProxyForwarder now calls a new markProviderFailed() helper (replacing direct failedProviderIds.push()) which:

Adds the provider to the failed list (existing behavior)
Immediately releases the provider session from Redis (new behavior)

Changes

Core Changes

src/lib/rate-limit/service.ts (+30/-0) - Add releaseProviderSession() static method with defensive checks and error logging
src/app/v1/_lib/proxy/forwarder.ts (+30/-7) - Replace 7 direct failedProviderIds.push() calls with markProviderFailed(); add new private static helper

Test Coverage

tests/unit/lib/rate-limit/provider-session-release.test.ts (new, +82 lines) - Unit tests for releaseProviderSession covering:
- Successful session release from ZSET
- Redis unavailable / not ready handling
- Invalid providerId / empty sessionId rejection
- Error logging without throwing
tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts (new, +54 lines) - Integration tests for markProviderFailed covering:
- Session release on provider failure
- Null sessionId handling (no Redis call)

Related Issues/PRs

Related to fix(proxy): tighten numeric error status boundaries #1104 - Both address proxy failover reliability improvements
Related to fix(proxy): classify errors by rule category to preserve failover for service errors #1103 - Companion fix in the provider failover area
Related to Bug: 总供应商数: 0 个的时候一直等到超时结束 #1016 - Provider timeout handling improvements (different root cause but same area)

Verification

# LSP diagnostics: no diagnostics for modified files
bunx vitest run \
  tests/unit/lib/rate-limit/provider-session-release.test.ts \
  tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts \
  tests/unit/proxy/proxy-forwarder-retry-limit.test.ts \
  tests/unit/proxy/proxy-forwarder.test.ts

bun run lint      # Biome check
bun run typecheck # TypeScript validation
bun run test      # Full test suite
bun run build     # Production build

Testing

Automated Tests

Unit tests added for RateLimitService.releaseProviderSession()
Integration tests added for ProxyForwarder.markProviderFailed()
Regression tests pass (existing forwarder tests)

Manual Testing

Not applicable - this is a background reliability fix with no UI changes.

Breaking Changes

None. This fix only affects internal Redis state cleanup timing; no APIs or behaviors change.

Checklist

Code follows project conventions (no emoji, i18n where applicable)
Self-review completed
Tests added and passing
No breaking changes
No UI changes (screenshots N/A)
Target branch is dev

Description enhanced by Claude AI

Greptile Summary

This PR fixes a Redis ZSET leak where failed provider sessions remained in provider:{id}:active_sessions until TTL expiry, inflating concurrency counts during outage storms. The fix introduces reference-counted release: a new active_session_refs hash tracks how many in-flight acquisitions each session holds for a given provider, and RELEASE_PROVIDER_SESSION only executes ZREM when the last reference is dropped. All existing cleanup paths are updated to also clean the ref hash, maintaining consistency.

Confidence Score: 5/5

Safe to merge; changes are additive, all failure cases fail-open, and the fix is well-tested.

No P0 or P1 issues found. The ref-counting design is carefully guarded: consumeProviderSessionRef prevents double-release within a request, failedProviderIds.includes() deduplicates calls to markProviderFailed, and releaseProviderSession is fire-and-forget with error logging so it cannot break the request path.

Minor attention to src/app/v1/_lib/proxy/session.ts (dead null guard) and src/app/v1/_lib/proxy/forwarder.ts (duck-typed consumeProviderSessionRef access in markProviderFailed).

Important Files Changed

Filename	Overview
src/app/v1/_lib/proxy/forwarder.ts	Replaces 7 direct failedProviderIds.push() calls with markProviderFailed(); adds concurrent-session acquisition in startAttempt for hedge providers; converts launchingAlternative from one-shot to retry while-loop. Logic is sound with proper guards against double-release.
src/lib/redis/lua-scripts.ts	Extends CHECK_AND_TRACK_SESSION to 2 keys (ZSET + ref hash), adds ref-count increment with referenced return value and expired-session cleanup of ref hash. Adds new RELEASE_PROVIDER_SESSION script using reference counting to safely ZREM when last ref is released.
src/lib/rate-limit/service.ts	Extends checkAndTrackProviderSession return type with referenced field; adds releaseProviderSession static method with defensive input validation, Redis readiness check, and error logging without throw.
src/app/v1/_lib/proxy/session.ts	Adds providerSessionRefs Set field initialized as class property; adds recordProviderSessionRef and consumeProviderSessionRef methods with a redundant null guard on the already-initialized field.
src/lib/session-tracker.ts	Adds ref-hash cleanup alongside every ZSET cleanup path; refreshes ref-hash TTL in updateProvider and refreshSession. New getProviderActiveSessionRefsKey helper pattern-matches keys correctly.

Sequence Diagram

sequenceDiagram
    participant PS as provider-selector
    participant PF as ProxyForwarder
    participant RL as RateLimitService
    participant RD as Redis
    PS->>RL: checkAndTrackProviderSession
    RL->>RD: CHECK_AND_TRACK_SESSION Lua
    RD-->>RL: allowed=1 referenced=1
    RL-->>PS: allowed true referenced true
    PS->>PS: session.recordProviderSessionRef
    Note over PF: Provider fails
    PF->>PF: markProviderFailed
    PF->>RL: releaseProviderSession
    RL->>RD: RELEASE_PROVIDER_SESSION Lua
    RD-->>RL: removed=1 remainingRefs=0
    Note over PF: Session terminates
    PF->>RD: ZREM active_sessions + HDEL active_session_refs

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/session.ts
Line: 320-322

Comment:
**Unreachable null guard on class-initialized field**

`providerSessionRefs` is declared as a class field initializer (`private providerSessionRefs = new Set<number>()`), so it can never be `null` or `undefined` here. The `if (!this.providerSessionRefs)` branch is dead code and could mislead future readers into thinking the field might be absent.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 4300-4318

Comment:
**Duck-typed access to `consumeProviderSessionRef` bypasses TypeScript safety**

`markProviderFailed` casts `session` to an anonymous structural type to reach `consumeProviderSessionRef`, rather than referencing it through `ProxySession`'s declared interface. If the method is ever renamed or removed from `ProxySession`, TypeScript will not catch the silent no-op — the optional-chaining guard `?.call(...)` will evaluate to `undefined` (falsy) and skip the Redis release without any compile-time or runtime error.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/lib/redis/lua-scripts.ts
Line: 72-76

Comment:
**Ref increment for already-tracked sessions can accumulate across concurrent requests**

The condition `if not is_tracked or existing_refs > 0` increments refs when the session is already in the ZSET and has an existing ref count. Within a single request this is safe because `launchedProviderIds` prevents re-selecting the same provider. However, two independent HTTP requests sharing the same `sessionId` racing against the same provider will each increment refs, requiring two `RELEASE_PROVIDER_SESSION` calls before the ZSET member is removed. A clarifying inline comment on the multi-caller invariant would help future readers.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (3): Last reviewed commit: "fix(rate-limit): preserve provider sessi..." | Re-trigger Greptile}

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

coderabbitai · 2026-04-25T14:30:14Z

📝 Walkthrough

Walkthrough

在转发器中集中化失败提供商记录为 markProviderFailed(...)，新增按会话引用释放提供商会话的 RateLimit 方法与 Redis Lua/引用计数支持，并在会话追踪、会话终止和多处 hedge 流程中使用该机制；相应单元测试已扩展覆盖新行为。

Changes

Cohort / File(s)	Summary
Proxy forwarder `src/app/v1/_lib/proxy/forwarder.ts`	引入私有 `markProviderFailed(...)`，统一失败提供商去重记录并在有 session.sessionId 时调用释放；流式 hedge 路径重构以循环选择替代、把 `startAttempt` 改为 `Promise<boolean>` 并在并发受限时记录链路条目与标记失败。
Proxy session / selector `src/app/v1/_lib/proxy/session.ts`, `src/app/v1/_lib/proxy/provider-selector.ts`	`ProxySession` 增加 `providerSessionRefs` 集合与 `recordProviderSessionRef` / `consumeProviderSessionRef` 方法；选择器在 `ensure` 成功且 `checkResult.referenced` 为真时记录会话引用。
Rate limit & Redis Lua `src/lib/rate-limit/service.ts`, `src/lib/redis/lua-scripts.ts`	`checkAndTrackProviderSession` 返回包含 `referenced` 字段，改为使用两个 Redis key（active_sessions 与 active_session_refs）；新增 `releaseProviderSession(providerId, sessionId)`，并新增 `RELEASE_PROVIDER_SESSION` Lua 脚本以按会话引用递减并在需要时从 ZSET/HASH 中移除。
Session tracking/termination `src/lib/session-tracker.ts`, `src/lib/session-manager.ts`	会话更新/批量统计与终止清理扩展为同时维护与删除 `provider:<id>:active_session_refs` 的引用条目并在清理流程中删除过期 refs。
Unit tests — rate limit & session release `tests/unit/lib/rate-limit/provider-session-release.test.ts`, `tests/unit/lib/rate-limit/service-extra.test.ts`	新增 `releaseProviderSession` 测试覆盖成功、引用残留、Redis 不可用、无效输入与 Redis 错误日志；更新 `checkAndTrackProviderSession` 测试以断言新增的 `referenced` 语义与新的 Lua 调用签名。
Unit tests — proxy / hedge `tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts`, `tests/unit/proxy/proxy-forwarder-hedge-first-byte.test.ts`	为 `markProviderFailed` 添加去重与 sessionId 有/无场景测试；hedge 测试新增断言确保释放失败/中止/被跳过提供商的会话（例如 `"sess-hedge"`），并验证并发受限时跳过提供商行为。
Unit tests — session tracker/manager mocks `tests/unit/lib/session-tracker-cleanup.test.ts`, `tests/unit/lib/session-manager-terminate-session.test.ts`	扩展测试 Redis mock（新增 `hdel`, `zrangebyscore` 等）以支持 refs 清理与终止路径的 pipeline 调用。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix: bound SessionTracker active_sessions zsets by env TTL #718 — 修改了 RateLimitService.checkAndTrackProviderSession 与 CHECK_AND_TRACK_SESSION Lua 脚本的键/返回值，可能与本次对 Lua 与返回元组扩展直接冲突或交互。
fix: run reactive rectifiers during streaming hedge #945 — 也变更了 src/app/v1/_lib/proxy/forwarder.ts 的流式 hedge 逻辑，可能与本次对 hedge 重构（并发门控、重试/选择循环）发生代码级冲突。
feat: 添加供应商查询缓存, 改善性能 #554 — 涉及代理会话与提供商选择流程的改动，可能与本次在 ProxySession/provider-reference 处理上的改变存在重叠。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	标题清晰准确地总结了主要变更：即时释放失败提供者的会话，这正是整个PR的核心目标。
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR描述清晰完整，详细说明了问题、解决方案、修改内容、测试覆盖和验证步骤，与代码变更高度相关。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/redis-provider-failure-20260425

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a mechanism to immediately release provider-level active sessions when a provider fails, preventing 'outage storms' from inflating concurrency counts. It adds a releaseProviderSession method to the RateLimitService and integrates it into the ProxyForwarder via a new markProviderFailed helper. Feedback suggests converting this operation to a fire-and-forget pattern to ensure that Redis latency or failures do not block the critical provider failover logic.

gemini-code-assist · 2026-04-25T14:32:26Z

+  private static async markProviderFailed(
+    session: ProxySession,
+    failedProviderIds: number[],
+    providerId: number
+  ): Promise<void> {
+    failedProviderIds.push(providerId);
+
+    if (!session.sessionId) {
+      return;
+    }
+
+    await RateLimitService.releaseProviderSession(providerId, session.sessionId);
+  }


For better performance during provider failover, consider making this a fire-and-forget operation. The releaseProviderSession call to Redis doesn't need to block the provider switching logic, especially during high load or Redis latency spikes. The releaseProviderSession method is designed to be safe for this as it handles its own errors.

By making this method synchronous and using void for the async call, you can improve the resilience of the failover process.

Note: Applying this suggestion will require removing await from all call sites of markProviderFailed.

Suggested change

private static async markProviderFailed(

session: ProxySession,

failedProviderIds: number[],

providerId: number

): Promise<void> {

failedProviderIds.push(providerId);

if (!session.sessionId) {

return;

}

await RateLimitService.releaseProviderSession(providerId, session.sessionId);

}

private static markProviderFailed(

session: ProxySession,

failedProviderIds: number[],

providerId: number

): void {

failedProviderIds.push(providerId);

if (!session.sessionId) {

return;

}

// Fire-and-forget to avoid blocking failover on this Redis command.

void RateLimitService.releaseProviderSession(providerId, session.sessionId);

}

Fixed in a9fcbd6: markProviderFailed is now synchronous and uses fire-and-forget releaseProviderSession, so Redis cleanup no longer blocks provider failover.

github-actions · 2026-04-25T14:34:03Z

🧪 测试结果

测试类型	状态
代码质量	✅
单元测试	✅
集成测试	✅
API 测试	✅

总体结果: ✅ 所有测试通过

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 4262-4274: The helper markProviderFailed currently adds providerId
to failedProviderIds and releases the session via
RateLimitService.releaseProviderSession but is not invoked from the streaming
hedge path; update sendStreamingWithHedge so that whenever the first attempt's
provider fails or an alternative provider wins, it calls
markProviderFailed(session, failedProviderIds, providerId) (using the same
ProxySession and failedProviderIds passed into sendStreamingWithHedge) before
switching providers; ensure both the failure branch and the "alternative
provider won" completion path in sendStreamingWithHedge use this helper so the
original provider's active session is released and no stale
provider:{id}:active_sessions entries remain.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b5759987-5710-464d-972b-2096ee18b76a

📥 Commits

Reviewing files that changed from the base of the PR and between 1381474 and f454dcd.

📒 Files selected for processing (4)

src/app/v1/_lib/proxy/forwarder.ts
src/lib/rate-limit/service.ts
tests/unit/lib/rate-limit/provider-session-release.test.ts
tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts

github-actions

Code Review Summary

No significant issues identified in this PR.

PR Size: M

Lines changed: 203
Files changed: 4

Review Coverage

Automated review by Codex AI

github-actions

Reviewed PR #1108 (fix(proxy): release failed provider sessions) diff across 4 files (src/app/v1/_lib/proxy/forwarder.ts, src/lib/rate-limit/service.ts, and 2 new unit tests).
Applied PR size label size/M (203 lines changed, 4 files).
No diff-line issues met the reporting threshold, so no inline review comments were posted.
Submitted the required summary review comment via gh pr review (visible on the PR).

github-actions

Code Review Summary

No significant issues identified in this PR. The implementation correctly addresses the Redis ZSET churn problem by immediately releasing provider sessions when providers fail, preventing outage storms from leaving failed sessions resident until TTL cleanup.

PR Size: M

Lines changed: 203 (196 additions, 7 deletions)
Files changed: 4

Review Coverage

Implementation Highlights

RateLimitService.releaseProviderSession() (src/lib/rate-limit/service.ts:859-886)

Properly validates inputs (providerId > 0, non-empty sessionId)
Defensively checks Redis readiness before operation
Correctly uses zrem to remove session from provider ZSET
Errors are logged but not thrown (appropriate for cleanup operations)
JSDoc comment accurately explains the purpose

ProxyForwarder.markProviderFailed() (src/app/v1/_lib/proxy/forwarder.ts)

Replaces 7 direct failedProviderIds.push() calls consistently
Checks for null sessionId before attempting Redis release
Proper async/await handling without blocking failover flow

Test Coverage

Unit tests cover: successful release, Redis unavailable, invalid inputs, error handling
Integration tests cover: session release on failure, null sessionId handling

Notes

The early-throw path (shouldSkipRawRetryAndProviderSwitch) intentionally does not call markProviderFailed as the request terminates immediately rather than falling back; session cleanup will occur via TTL.
Async overhead in markProviderFailed is acceptable as it only occurs during failover scenarios.

Automated review by Claude AI

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9fcbd6225

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T14:56:13Z

+      return;
+    }
+
+    void RateLimitService.releaseProviderSession(providerId, session.sessionId);


Preserve provider session membership for concurrent requests

markProviderFailed now always calls releaseProviderSession, which does a raw ZREM by sessionId; because provider concurrency is tracked as one ZSET member per session (via checkAndTrackProviderSession), this removes the provider session for all in-flight requests sharing that session. If one concurrent request fails over while another request from the same session is still actively using that provider, the active session count is under-reported and new sessions can be admitted past the configured provider concurrency limit until the next refresh/retrack.

Useful? React with 👍 / 👎.

github-actions · 2026-04-25T14:56:13Z

🧪 测试结果

测试类型	状态
代码质量	✅
单元测试	✅
集成测试	✅
API 测试	✅

总体结果: ✅ 所有测试通过

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/v1/_lib/proxy/forwarder.ts (1)

3442-3450: ⚠️ Potential issue | 🟠 Major

abortAttempt 只在 hedge loser 分支释放 provider session

这里的清理条件太窄了。abortAllAttempts() 还会在 Line 3546 用 "hedge_launch_failed"、在 Line 4013 用 "client_abort" 终止所有 in-flight attempt，但这两条路径都不会走 markProviderFailed()，因此对应的 provider:{id}:active_sessions 仍会残留到 TTL。这样在 selector 故障或客户端提前断开时，新的并发计数膨胀问题还会继续出现。

建议修改

       attempts.delete(attempt);
-      if (reason === "hedge_loser") {
+      const shouldReleaseProviderSession =
+        reason === "hedge_loser" ||
+        reason === "hedge_launch_failed" ||
+        reason === "client_abort";
+
+      if (reason === "hedge_loser") {
         session.addProviderToChain(attempt.provider, {
           ...attempt.endpointAudit,
           reason: "hedge_loser_cancelled",
           attemptNumber: attempt.sequence,
           modelRedirect: getAttemptModelRedirect(attempt),
         });
-        ProxyForwarder.markProviderFailed(session, failedProviderIds, attempt.provider.id);
+      }
+
+      if (shouldReleaseProviderSession) {
+        ProxyForwarder.markProviderFailed(session, failedProviderIds, attempt.provider.id);
       }

Also applies to: 3543-3547, 3994-4014

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/v1/_lib/proxy/forwarder.ts` around lines 3442 - 3450, The abort logic
only calls ProxyForwarder.markProviderFailed(...) when reason === "hedge_loser",
leaving provider:{id}:active_sessions stuck for other abort reasons
("hedge_launch_failed", "client_abort"); update abortAttempt/abortAllAttempts to
always perform the same provider-session cleanup: after
session.addProviderToChain(...) (or wherever attempts are terminated) call
ProxyForwarder.markProviderFailed(session, failedProviderIds,
attempt.provider.id) (or collect attempt.provider.id into failedProviderIds and
call markProviderFailed once for each aborted attempt) so every aborted attempt
path (including hedge_launch_failed and client_abort) decrements/cleans provider
active session state consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 3442-3450: The abort logic only calls
ProxyForwarder.markProviderFailed(...) when reason === "hedge_loser", leaving
provider:{id}:active_sessions stuck for other abort reasons
("hedge_launch_failed", "client_abort"); update abortAttempt/abortAllAttempts to
always perform the same provider-session cleanup: after
session.addProviderToChain(...) (or wherever attempts are terminated) call
ProxyForwarder.markProviderFailed(session, failedProviderIds,
attempt.provider.id) (or collect attempt.provider.id into failedProviderIds and
call markProviderFailed once for each aborted attempt) so every aborted attempt
path (including hedge_launch_failed and client_abort) decrements/cleans provider
active session state consistently.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78251726-0608-47e1-aac4-96d51fcb3ce0

📥 Commits

Reviewing files that changed from the base of the PR and between f454dcd and a9fcbd6.

📒 Files selected for processing (3)

src/app/v1/_lib/proxy/forwarder.ts
tests/unit/proxy/proxy-forwarder-hedge-first-byte.test.ts
tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

github-actions · 2026-04-25T16:11:38Z

🧪 测试结果

测试类型	状态
代码质量	✅
单元测试	✅
集成测试	✅
API 测试	✅

总体结果: ✅ 所有测试通过

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5375ce4eb5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T16:16:16Z

      const key = `provider:${providerId}:active_sessions`;
+      const refKey = `provider:${providerId}:active_session_refs`;
      const now = Date.now();


Hash-tag provider keys before multi-key Lua eval

checkAndTrackProviderSession now executes Lua with two keys (provider:${providerId}:active_sessions and provider:${providerId}:active_session_refs) that do not share a hash tag, so Redis Cluster deployments will return CROSSSLOT for this eval. Because this method catches the error and fail-opens, provider concurrent-session limits are silently bypassed (and the paired release script fails similarly), which can overload providers in cluster-backed environments.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-25T16:16:16Z

+if not is_tracked or existing_refs > 0 then
+  redis.call('HINCRBY', ref_key, session_id, 1)
+  referenced = 1


Stop ref-count growth across successful provider requests

The Lua tracking script increments active_session_refs for every request once a ref exists, but refs are only decremented on markProviderFailed/releaseProviderSession; successful requests never decrement. In a normal session that has successes before a failover, this causes ref counts to accumulate indefinitely, so a later failure decrements by only one and leaves the failed provider session resident in active_sessions, keeping concurrency counts inflated and defeating the intended immediate cleanup.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

src/lib/rate-limit/service.ts (1)
877-910: releaseProviderSession 在 Redis 暂时不可用时是静默 no-op，建议确认是否可接受。

当 redis.status !== "ready" 时直接 return，不抛错也不记录任何日志（连 debug/warn 都没有）。这意味着如果 provider 被标记为失败时 Redis 恰好处于断连/重连阶段，对应的 active session 既不会被立即释放，也不会留下任何观测线索，只能等待 ZSET TTL 自然过期。

考虑到本 PR 的目标恰恰是“避免 lingering sessions 等到 TTL”，建议在该分支至少打一条 logger.warn，便于后续排查为何释放未执行：
📝 建议改动
   const redis = RateLimitService.redis;
   if (!redis || redis.status !== "ready") {
+    logger.warn("[RateLimit] Redis not ready, skip releaseProviderSession", {
+      providerId,
+      sessionId,
+    });
     return;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/rate-limit/service.ts` around lines 877 - 910, The function
releaseProviderSession silently returns when Redis is not ready; change it to
log a warning so failures to release sessions are observable: inside
RateLimitService.releaseProviderSession, detect the early-return branch where
RateLimitService.redis is missing or redis.status !== "ready" and add a
logger.warn (including providerId and sessionId) before returning (do not
convert this into throwing an error unless intended). Ensure the warning message
clearly states Redis is unavailable and session release was skipped so operators
can trace lingering sessions.
tests/unit/lib/session-manager-terminate-session.test.ts (1)
30-30: 新增了 hdel mock，但没有断言 terminateSession 对 active_session_refs 的清理行为。

本 PR 在 SessionManager.terminateSession 中新增了 pipeline.hdel(\provider:${providerId}:active_session_refs`, sessionId)（src/lib/session-manager.ts L2436）。当前测试只在 mock 上加了 hdel占位，但没有任何expect(pipelineRef.hdel).toHaveBeenCalledWith(...)`，这会让该新清理路径处于无测试覆盖状态，未来回归不易被发现。
♻️ 建议在已有用例中追加断言
     expect(pipelineRef.zrem).toHaveBeenCalledWith("provider:42:active_sessions", sessionId);
+    expect(pipelineRef.hdel).toHaveBeenCalledWith(
+      "provider:42:active_session_refs",
+      sessionId
+    );
     expect(pipelineRef.zrem).toHaveBeenCalledWith(getKeyActiveSessionsKey(7), sessionId);
Also applies to: 42-67
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/lib/session-manager-terminate-session.test.ts` at line 30, 测试新增了
pipeline.hdel 的 mock，但没有断言 SessionManager.terminateSession 对 active_session_refs
的清理行为；请在对应单元测试（tests/unit/lib/session-manager-terminate-session.test.ts）中为
pipelineRef.hdel 增加断言，验证 terminateSession 调用时 pipelineRef.hdel 被以正确参数调用（即键
"provider:${providerId}:active_session_refs" 和 sessionId），使用
expect(pipelineRef.hdel).toHaveBeenCalledWith(...) 或等效匹配以覆盖该清理路径。
src/lib/redis/lua-scripts.ts (1)
23-27: 注释中拒绝路径未提及 referenced=0，建议补全契约说明。

第 27 行只描述返回值含义为 tracked=0，但实际返回元组是 {0, count, 0, 0}，最后一项也是 referenced=0。为保持与其他三种返回情况一致，建议在注释中明确这一点，避免后续维护者误解第四个返回值的含义。
📝 建议改动
- * - {0, count, 0, 0} - rejected (limit reached), returns current count and tracked=0
+ * - {0, count, 0, 0} - rejected (limit reached), returns current count, tracked=0, referenced=0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/redis/lua-scripts.ts` around lines 23 - 27, Update the return-value
comment block that documents the Lua script's tuple responses so the rejection
case explicitly notes referenced=0; specifically, change the fourth bullet from
"{0, count, 0, 0} - rejected (limit reached), returns current count and
tracked=0" to explicitly state referenced=0 as well (e.g. "{0, count, 0, 0} -
rejected (limit reached), returns current count, tracked=0, referenced=0") so
the documented contract for the tuple {flag, count, tracked, referenced} is
complete and consistent with the other bullets.
src/lib/session-tracker.ts (1)
439-492: 两阶段引用清理的索引/分支判断正确，但建议留意双 pipeline 顺序。

将每个 provider 的清理命令从 2 条扩为 3 条（zrangebyscore + zremrangebyscore + zrange），且使用 i * 3 / i * 3 + 2 取偏移，与 pipeline 顺序一致；hasRefCleanup 用于跳过空 pipeline 的 exec() 也合理。同样地，refCleanupPipeline.exec() 与上一个 cleanupPipeline 不在同一 round-trip 内，存在与上一处相同的小窗口（被 zremrangebyscore 标记为过期但被并发刷新的 sessionId 仍会在此被 HDEL）。本路径属于批量计数热路径，若评估到对统计准确性敏感，建议结合 Lua 合并这两个 pipeline。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/session-tracker.ts` around lines 439 - 492, The current two-pipeline
approach (cleanupPipeline and refCleanupPipeline) leaves a small race where a
session removed by zremrangebyscore can be concurrently refreshed and still get
HDEL'd; fix by combining the per-provider cleanup and ref deletion into a single
atomic Redis script/command so zrangebyscore, zremrangebyscore, zrange (and hdel
of expired IDs) run in one round-trip. Replace the loop that builds
cleanupPipeline + refCleanupPipeline with a single EVAL (or multi-key Lua)
invocation that, for each providerId from providerIds, performs
zrangebyscore(key, "-inf", cutoffMs) to collect expired IDs,
zremrangebyscore(key, "-inf", cutoffMs) to remove them, zrange(key, 0, -1) to
get remaining sessions, and hdel("provider:{providerId}:active_session_refs",
expiredIds...) to remove their refs, and return both expired and remaining lists
so you can populate expiredProviderSessions, providerSessionMap, and
allSessionIds from the script result atomically.
tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts (1)
25-111: 测试场景对 markProviderFailed 的四个分支覆盖完整。

consumeProviderSessionRef → true/false、重复标记去重、sessionId === null 都覆盖到了，且每个场景都同时断言 failedProviderIds、consumeProviderSessionRef 调用次数和 releaseProviderSession 是否被调用，能很好地兜住该 helper 的契约。一个可选的补充用例（不强求）：在 sessionId 存在但 consumeProviderSessionRef 方法本身缺失（比如运行时 session 未升级到新接口）时，验证不抛异常——这正好对应 forwarder 中那段 as { consumeProviderSessionRef?: ... } 防御式断言所要保护的场景。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts` around
lines 25 - 111, Tests already cover four branches of markProviderFailed; add one
more test case to ensure it doesn't throw when session lacks
consumeProviderSessionRef: import ProxyForwarder, call
forwarderInternals.markProviderFailed with session = { sessionId:
"sess_no_consume" } cast as ProxySession (no consumeProviderSessionRef),
failedProviderIds = [], and providerId like 42, then assert failedProviderIds
contains 42 and mocks.releaseProviderSession behavior matches existing logic
(should be not called because no consume ref or only called when sessionId
present and consume returns true). This validates the defensive optional
signature for consumeProviderSessionRef in markProviderFailed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 4291-4315: The code in markProviderFailed uses a defensive type
assertion to access consumeProviderSessionRef but ProxySession already exposes
that method; remove the cast and call the method directly (mirror the style used
for session.recordProviderSessionRef), e.g. replace the casted
providerSessionRefConsumer usage with a direct invocation of
session.consumeProviderSessionRef(providerId), keep the same null/boolean check
and the subsequent call to
RateLimitService.releaseProviderSession(session.sessionId) when it returns true.

In `@src/lib/redis/lua-scripts.ts`:
- Around line 103-122: The RELEASE_PROVIDER_SESSION Lua script currently returns
early when HGET(ref_key, session_id) is missing or zero, leaving ZSET members
orphaned; modify RELEASE_PROVIDER_SESSION so that when current_refs <= 0 it
still performs redis.call('ZREM', provider_key, session_id) (remove the member
from the ZSET) and then return {removed, 0}; keep the existing logic for
remaining_refs > 0 and for the HDEL path, but ensure the early-return case does
ZREM as a fallback cleanup so legacy ZSET-only sessions are removed atomically
by the script.

In `@src/lib/session-tracker.ts`:
- Around line 257-267: The post-pipeline redis.hdel call (after pipeline.exec())
can race with concurrent refreshSession/checkAndTrackProviderSession updates and
produce ZSET refs/Hashmap inconsistencies; move the hdel into the same Redis
pipeline or combine the read/remove/hdel logic into a single atomic operation
(e.g., include the hdel in the same pipeline used around
cleanupExpiredSessionsResultIndex and providerRefKey or replace the sequence
with a Lua script) so that expiredResult processing
(cleanupExpiredSessionsResultIndex -> expiredSessionIds -> redis.hdel) becomes
atomic; apply the same fix pattern to countFromZSet where zrangebyscore,
zremrangebyscore and hdel are separate calls.

---

Nitpick comments:
In `@src/lib/rate-limit/service.ts`:
- Around line 877-910: The function releaseProviderSession silently returns when
Redis is not ready; change it to log a warning so failures to release sessions
are observable: inside RateLimitService.releaseProviderSession, detect the
early-return branch where RateLimitService.redis is missing or redis.status !==
"ready" and add a logger.warn (including providerId and sessionId) before
returning (do not convert this into throwing an error unless intended). Ensure
the warning message clearly states Redis is unavailable and session release was
skipped so operators can trace lingering sessions.

In `@src/lib/redis/lua-scripts.ts`:
- Around line 23-27: Update the return-value comment block that documents the
Lua script's tuple responses so the rejection case explicitly notes
referenced=0; specifically, change the fourth bullet from "{0, count, 0, 0} -
rejected (limit reached), returns current count and tracked=0" to explicitly
state referenced=0 as well (e.g. "{0, count, 0, 0} - rejected (limit reached),
returns current count, tracked=0, referenced=0") so the documented contract for
the tuple {flag, count, tracked, referenced} is complete and consistent with the
other bullets.

In `@src/lib/session-tracker.ts`:
- Around line 439-492: The current two-pipeline approach (cleanupPipeline and
refCleanupPipeline) leaves a small race where a session removed by
zremrangebyscore can be concurrently refreshed and still get HDEL'd; fix by
combining the per-provider cleanup and ref deletion into a single atomic Redis
script/command so zrangebyscore, zremrangebyscore, zrange (and hdel of expired
IDs) run in one round-trip. Replace the loop that builds cleanupPipeline +
refCleanupPipeline with a single EVAL (or multi-key Lua) invocation that, for
each providerId from providerIds, performs zrangebyscore(key, "-inf", cutoffMs)
to collect expired IDs, zremrangebyscore(key, "-inf", cutoffMs) to remove them,
zrange(key, 0, -1) to get remaining sessions, and
hdel("provider:{providerId}:active_session_refs", expiredIds...) to remove their
refs, and return both expired and remaining lists so you can populate
expiredProviderSessions, providerSessionMap, and allSessionIds from the script
result atomically.

In `@tests/unit/lib/session-manager-terminate-session.test.ts`:
- Line 30: 测试新增了 pipeline.hdel 的 mock，但没有断言 SessionManager.terminateSession 对
active_session_refs
的清理行为；请在对应单元测试（tests/unit/lib/session-manager-terminate-session.test.ts）中为
pipelineRef.hdel 增加断言，验证 terminateSession 调用时 pipelineRef.hdel 被以正确参数调用（即键
"provider:${providerId}:active_session_refs" 和 sessionId），使用
expect(pipelineRef.hdel).toHaveBeenCalledWith(...) 或等效匹配以覆盖该清理路径。

In `@tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts`:
- Around line 25-111: Tests already cover four branches of markProviderFailed;
add one more test case to ensure it doesn't throw when session lacks
consumeProviderSessionRef: import ProxyForwarder, call
forwarderInternals.markProviderFailed with session = { sessionId:
"sess_no_consume" } cast as ProxySession (no consumeProviderSessionRef),
failedProviderIds = [], and providerId like 42, then assert failedProviderIds
contains 42 and mocks.releaseProviderSession behavior matches existing logic
(should be not called because no consume ref or only called when sessionId
present and consume returns true). This validates the defensive optional
signature for consumeProviderSessionRef in markProviderFailed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e068c22-192f-470f-b18f-cbd885a3f8d9

📥 Commits

Reviewing files that changed from the base of the PR and between a9fcbd6 and 5375ce4.

📒 Files selected for processing (13)

src/app/v1/_lib/proxy/forwarder.ts
src/app/v1/_lib/proxy/provider-selector.ts
src/app/v1/_lib/proxy/session.ts
src/lib/rate-limit/service.ts
src/lib/redis/lua-scripts.ts
src/lib/session-manager.ts
src/lib/session-tracker.ts
tests/unit/lib/rate-limit/provider-session-release.test.ts
tests/unit/lib/rate-limit/service-extra.test.ts
tests/unit/lib/session-manager-terminate-session.test.ts
tests/unit/lib/session-tracker-cleanup.test.ts
tests/unit/proxy/proxy-forwarder-hedge-first-byte.test.ts
tests/unit/proxy/proxy-forwarder-provider-session-release.test.ts

coderabbitai · 2026-04-25T16:17:09Z

+  private static markProviderFailed(
+    session: ProxySession,
+    failedProviderIds: number[],
+    providerId: number
+  ): void {
+    if (failedProviderIds.includes(providerId)) {
+      return;
+    }
+
+    failedProviderIds.push(providerId);
+
+    if (!session.sessionId) {
+      return;
+    }
+
+    const providerSessionRefConsumer = (
+      session as { consumeProviderSessionRef?: (providerId: number) => boolean }
+    ).consumeProviderSessionRef;
+
+    if (!providerSessionRefConsumer?.call(session, providerId)) {
+      return;
+    }
+
+    void RateLimitService.releaseProviderSession(providerId, session.sessionId);
+  }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify ProxySession exposes recordProviderSessionRef and consumeProviderSessionRef on its public type. fd -t f 'session.ts' src/app/v1/_lib/proxy/ --exec rg -nP -C2 'recordProviderSessionRef|consumeProviderSessionRef|providerSessionRefs' {}

Repository: ding113/claude-code-hub

Length of output: 842

统一使用直接方法调用而非防御式类型断言

验证表明 recordProviderSessionRef 和 consumeProviderSessionRef 两个方法都已在 ProxySession 中公开定义（分别在 session.ts 的第 320 和 330 行），因此 forwarder.ts 第 4306-4310 行的防御式类型断言：

const providerSessionRefConsumer = ( session as { consumeProviderSessionRef?: (providerId: number) => boolean } ).consumeProviderSessionRef;

是冗余的。建议统一使用直接调用方式，保持与第 3954 行 session.recordProviderSessionRef(provider.id) 的风格一致。

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/v1/_lib/proxy/forwarder.ts` around lines 4291 - 4315, The code in markProviderFailed uses a defensive type assertion to access consumeProviderSessionRef but ProxySession already exposes that method; remove the cast and call the method directly (mirror the style used for session.recordProviderSessionRef), e.g. replace the casted providerSessionRefConsumer usage with a direct invocation of session.consumeProviderSessionRef(providerId), keep the same null/boolean check and the subsequent call to RateLimitService.releaseProviderSession(session.sessionId) when it returns true.

coderabbitai · 2026-04-25T16:17:09Z

+export const RELEASE_PROVIDER_SESSION = `
+local provider_key = KEYS[1]
+local ref_key = KEYS[2]
+local session_id = ARGV[1]
+
+local current_refs = tonumber(redis.call('HGET', ref_key, session_id) or '0')
+if current_refs <= 0 then
+  return {0, 0}
+end
+
+local remaining_refs = current_refs - 1
+if remaining_refs > 0 then
+  redis.call('HSET', ref_key, session_id, remaining_refs)
+  return {0, remaining_refs}
+end
+
+redis.call('HDEL', ref_key, session_id)
+local removed = redis.call('ZREM', provider_key, session_id)
+return {removed, remaining_refs}
+`;


⚠️ Potential issue | 🟡 Minor

遗留会员（在 ZSET 中但无引用计数）无法被 RELEASE_PROVIDER_SESSION 主动释放。

当 HGET ref_key session_id 返回 0 或缺失时，脚本直接 return {0, 0}，不会执行 ZREM provider_key。这意味着部署本 PR 之前由旧脚本写入的活跃 session（即 ZSET 里有 member 但 refs hash 中没有计数）在 provider 被标记为失败时无法被立即清理，只能等待 TTL 过期。

部署过渡期内，这些遗留会员会让供应商的 active_sessions 计数虚高，与本 PR 试图修复的“lingering sessions 直到 TTL”问题部分相同。建议：

方案 A（推荐）：当 current_refs <= 0 时仍执行 ZREM provider_key，作为兜底清理；将语义改为“无引用 → 直接移除 ZSET 会员”。

方案 B：保留当前行为，但在 RateLimitService.releaseProviderSession 中检测到 removed=0 && remainingRefs=0 时，额外调用一次 ZREM 兜底（注意非原子）。

♻️ 建议改动（方案 A）

local current_refs = tonumber(redis.call('HGET', ref_key, session_id) or '0') if current_refs <= 0 then - return {0, 0} + -- Legacy member without ref tracking (created before ref-count rollout): + -- best-effort remove from ZSET so eager release path still works during migration. + local removed_legacy = redis.call('ZREM', provider_key, session_id) + return {removed_legacy, 0} end

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const RELEASE_PROVIDER_SESSION = `

local provider_key = KEYS[1]

local ref_key = KEYS[2]

local session_id = ARGV[1]

local current_refs = tonumber(redis.call('HGET', ref_key, session_id) or '0')

if current_refs <= 0 then

return {0, 0}

end

local remaining_refs = current_refs - 1

if remaining_refs > 0 then

redis.call('HSET', ref_key, session_id, remaining_refs)

return {0, remaining_refs}

end

redis.call('HDEL', ref_key, session_id)

local removed = redis.call('ZREM', provider_key, session_id)

return {removed, remaining_refs}

`;

export const RELEASE_PROVIDER_SESSION = `

local provider_key = KEYS[1]

local ref_key = KEYS[2]

local session_id = ARGV[1]

local current_refs = tonumber(redis.call('HGET', ref_key, session_id) or '0')

if current_refs <= 0 then

-- Legacy member without ref tracking (created before ref-count rollout):

-- best-effort remove from ZSET so eager release path still works during migration.

local removed_legacy = redis.call('ZREM', provider_key, session_id)

return {removed_legacy, 0}

end

local remaining_refs = current_refs - 1

if remaining_refs > 0 then

redis.call('HSET', ref_key, session_id, remaining_refs)

return {0, remaining_refs}

end

redis.call('HDEL', ref_key, session_id)

local removed = redis.call('ZREM', provider_key, session_id)

return {removed, remaining_refs}

`;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/redis/lua-scripts.ts` around lines 103 - 122, The RELEASE_PROVIDER_SESSION Lua script currently returns early when HGET(ref_key, session_id) is missing or zero, leaving ZSET members orphaned; modify RELEASE_PROVIDER_SESSION so that when current_refs <= 0 it still performs redis.call('ZREM', provider_key, session_id) (remove the member from the ZSET) and then return {removed, 0}; keep the existing logic for remaining_refs > 0 and for the HDEL path, but ensure the early-return case does ZREM as a fallback cleanup so legacy ZSET-only sessions are removed atomically by the script.

coderabbitai · 2026-04-25T16:17:09Z

+      if (cleanupExpiredSessionsResultIndex !== null && results) {
+        const expiredResult = results[cleanupExpiredSessionsResultIndex];
+        if (!expiredResult?.[0] && Array.isArray(expiredResult?.[1])) {
+          const expiredSessionIds = expiredResult[1].filter(
+            (value): value is string => typeof value === "string" && value.length > 0
+          );
+          if (expiredSessionIds.length > 0) {
+            await redis.hdel(providerRefKey, ...expiredSessionIds);
+          }
+        }
+      }


⚠️ Potential issue | 🟡 Minor

post-pipeline hdel 与并发 zadd 之间存在小窗口的引用一致性偏差。

pipeline.exec() 完成后，再发起 redis.hdel(providerRefKey, ...expiredSessionIds) 是非原子的：在两次 round-trip 之间，并发的 refreshSession/checkAndTrackProviderSession 可能用更新的分数将其中某个被判过期的 sessionId 重新写回 ZSET（zremrangebyscore 不会删除它），但本次 hdel 仍会按缓存的 ID 列表删除其在 refs 中的引用计数，从而出现 “ZSET 里有该 sessionId、refs 里没有” 的孤儿记录。后续 releaseProviderSession 命中该会话时，引用计数会下溢。

考虑到清理只有 1% 的触发概率、时间窗很小，且属于尽力而为的清理，影响范围有限；但若希望彻底消除该窗口，可把 hdel 也合并进同一 pipeline（或封装为单条 Lua），以保证读取/删除/HDEL 的语义一致。countFromZSet 在行 602-609 的写法存在同样的窗口（zrangebyscore 与 zremrangebyscore、hdel 是三次独立调用）。

♻️ 可选：将 `hdel` 合并至同一 pipeline

if (cleanupExpiredSessionsResultIndex !== null && results) { const expiredResult = results[cleanupExpiredSessionsResultIndex]; if (!expiredResult?.[0] && Array.isArray(expiredResult?.[1])) { const expiredSessionIds = expiredResult[1].filter( (value): value is string => typeof value === "string" && value.length > 0 ); if (expiredSessionIds.length > 0) { - await redis.hdel(providerRefKey, ...expiredSessionIds); + // 合并到下一次 refresh 的 pipeline，或使用 Lua 脚本完成 zrange/zrem/hdel 的原子清理。 + await redis.hdel(providerRefKey, ...expiredSessionIds); } } }

更彻底的做法是将 zrange/zrem/hdel 合并为一段 Lua（与 getProviderSessionCountBatch 共用）。

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/session-tracker.ts` around lines 257 - 267, The post-pipeline redis.hdel call (after pipeline.exec()) can race with concurrent refreshSession/checkAndTrackProviderSession updates and produce ZSET refs/Hashmap inconsistencies; move the hdel into the same Redis pipeline or combine the read/remove/hdel logic into a single atomic operation (e.g., include the hdel in the same pipeline used around cleanupExpiredSessionsResultIndex and providerRefKey or replace the sequence with a Lua script) so that expiredResult processing (cleanupExpiredSessionsResultIndex -> expiredSessionIds -> redis.hdel) becomes atomic; apply the same fix pattern to countFromZSet where zrangebyscore, zremrangebyscore and hdel are separate calls.

ding113 and others added 2 commits April 25, 2026 14:29

fix(rate-limit): release failed provider sessions

537cec3

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

fix(proxy): release provider sessions on fallback

f454dcd

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

github-project-automation Bot added this to Claude Code Hub Roadmap Apr 25, 2026

github-project-automation Bot moved this to Backlog in Claude Code Hub Roadmap Apr 25, 2026

github-actions Bot added bug Something isn't working area:provider area:Rate Limit labels Apr 25, 2026

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

github-actions Bot added the size/M Medium PR (< 500 lines) label Apr 25, 2026

coderabbitai Bot requested changes Apr 25, 2026

View reviewed changes

Comment thread src/app/v1/_lib/proxy/forwarder.ts Outdated

github-actions Bot reviewed Apr 25, 2026

View reviewed changes

fix(proxy): release hedge loser sessions

a9fcbd6

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

coderabbitai Bot approved these changes Apr 25, 2026

View reviewed changes

fix(rate-limit): preserve provider session refs

5375ce4

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

coderabbitai Bot requested changes Apr 25, 2026

View reviewed changes

ding113 merged commit 331e6b7 into dev Apr 25, 2026
17 checks passed

github-project-automation Bot moved this from Backlog to Done in Claude Code Hub Roadmap Apr 25, 2026

github-actions Bot mentioned this pull request Apr 26, 2026

release v0.7.3 #1110

Merged

coderabbitai Bot mentioned this pull request Apr 26, 2026

fix(proxy): clean up client abort listeners #1113

Merged

ding113 deleted the fix/redis-provider-failure-20260425 branch May 13, 2026 06:22

Uh oh!

Conversation

ding113 commented Apr 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Core Changes

Test Coverage

Related Issues/PRs

Verification

Testing

Automated Tests

Manual Testing

Breaking Changes

Checklist

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

ding113 Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

🧪 测试结果

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: M

Review Coverage

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: M

Review Coverage

Implementation Highlights

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

🧪 测试结果

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

🧪 测试结果

Uh oh!

ding113 commented Apr 25, 2026 •

edited by greptile-apps Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading