Skip to content

fix: keep SSE stream alive with idle heartbeats (#597)#609

Merged
icebear0828 merged 1 commit into
devfrom
fix/stream-idle-heartbeat
Jun 2, 2026
Merged

fix: keep SSE stream alive with idle heartbeats (#597)#609
icebear0828 merged 1 commit into
devfrom
fix/stream-idle-heartbeat

Conversation

@icebear0828
Copy link
Copy Markdown
Owner

背景

Issue #597:客户端经 ngrok / cloudflared 隧道访问,几轮对话后(上下文 ~32-50K)流式响应卡死。表现:

  • 上游 egress 日志 status 200, stream:true,响应头正常返回
  • 但记录到 client-abort —— 是客户端侧(隧道)先断开,不是上游报错
  • 重启 codex-proxy 无效;只有 revert / 清空 Cursor 历史(把上下文降到阈值以下)才恢复
  • 全新窗口到 ~32K 也复现,与上下文体量强相关
  • ngrok 与 cloudflared 表现一致

根因

大上下文下 gpt-5.x 在 reasoning 阶段可能数十秒不产出任何 output 字节。chat-completions 流开头只发一个 role chunk,之后到第一个 reasoning_summary/output_text 之间全程静默summary:"auto" 短任务甚至无 summary delta)。这段静默期超过隧道 / LB / NAT 的 idle 超时 → 连接被静默 FIN → s.onAbort 触发记 client-abort → 掐掉上游。代码里早有相关注释(codex-event-extractor.ts:gpt-5.x 长 reasoning 会撞上游 duration cap),本 PR 处理的是更前置的“中间层 idle 超时”。

改动

response-processor.tsstreamResponse(两条流式路径——账号池 streaming-handler.ts + 直连 direct-request-handler.ts——的唯一汇聚点):

  • 静默间隙(距上次写出 > HEARTBEAT_INTERVAL_MS,默认 15s,远低于常见 30-60s idle 超时)写一行 SSE 注释 : ping\n\n 保活
  • 注释行被所有 SSE 解析器忽略 → 不污染 openai/anthropic/gemini/responses 任一格式内容,也不计入 stream trace
  • 活跃推流期间因 idle 判定不触发 → 零额外字节finally 停表;heartbeatMs=0 可关闭;定时器 unref() 不阻塞事件循环
  • 两个流式建立点补 X-Accel-Buffering: no,防 nginx 类反代缓冲住心跳/增量

测试

新增 tests/unit/stream-heartbeat.test.ts(5 例):

  • 静默间隙触发心跳,且心跳落在两个真实 chunk 之间
  • 流在间隔内结束时不触发心跳
  • 流结束后停表(等待超过间隔无新增心跳)
  • heartbeatMs=0 关闭
  • 默认间隔阈值 ≤ 25s

验证:

  • npx tsc --noEmit → exit 0
  • npm test → 241 files / 2384 passed, 1 skipped

待真实环境确认(E2E)

单测证明心跳机制本身正确,但真实隧道 + 大上下文 + 长 reasoning 的连续复现需在上报者环境确认:挂 ngrok/cloudflared,用之前卡死的 ~50K 窗口连续跑 ≥3 次观察是否还断;本地可 curl -N 观察长 reasoning 请求的 SSE 流是否周期性出现 : ping

Large-context sessions over ngrok/cloudflared tunnels froze mid-stream:
gpt-5.x can spend tens of seconds in reasoning with zero output bytes, so
the SSE connection sits idle and gets silently FIN'd by the tunnel/LB/NAT
idle timeout (logged as client-abort). Restarting the proxy did nothing;
only trimming conversation history (shrinking context) recovered.

streamResponse now writes an SSE comment line (': ping') whenever no real
chunk has been forwarded for HEARTBEAT_INTERVAL_MS (15s, well under common
30-60s idle timeouts). Comment lines are ignored by every SSE parser, so
no format (openai/anthropic/gemini/responses) is polluted and they are not
counted in the stream trace; active streaming never triggers a heartbeat.
Both streaming paths also set X-Accel-Buffering: no to stop nginx-class
proxies from buffering the heartbeats.
@icebear0828 icebear0828 force-pushed the fix/stream-idle-heartbeat branch from 57695f7 to adcbcd8 Compare June 2, 2026 04:14
@icebear0828 icebear0828 merged commit 306623c into dev Jun 2, 2026
3 checks passed
@icebear0828 icebear0828 deleted the fix/stream-idle-heartbeat branch June 2, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant