fix: keep SSE stream alive with idle heartbeats (#597)#609
Merged
Conversation
Large-context sessions over ngrok/cloudflared tunnels froze mid-stream:
gpt-5.x can spend tens of seconds in reasoning with zero output bytes, so
the SSE connection sits idle and gets silently FIN'd by the tunnel/LB/NAT
idle timeout (logged as client-abort). Restarting the proxy did nothing;
only trimming conversation history (shrinking context) recovered.
streamResponse now writes an SSE comment line (': ping') whenever no real
chunk has been forwarded for HEARTBEAT_INTERVAL_MS (15s, well under common
30-60s idle timeouts). Comment lines are ignored by every SSE parser, so
no format (openai/anthropic/gemini/responses) is polluted and they are not
counted in the stream trace; active streaming never triggers a heartbeat.
Both streaming paths also set X-Accel-Buffering: no to stop nginx-class
proxies from buffering the heartbeats.
57695f7 to
adcbcd8
Compare
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
Issue #597:客户端经 ngrok / cloudflared 隧道访问,几轮对话后(上下文 ~32-50K)流式响应卡死。表现:
status 200, stream:true,响应头正常返回client-abort—— 是客户端侧(隧道)先断开,不是上游报错根因
大上下文下 gpt-5.x 在 reasoning 阶段可能数十秒不产出任何 output 字节。chat-completions 流开头只发一个
rolechunk,之后到第一个 reasoning_summary/output_text 之间全程静默(summary:"auto"短任务甚至无 summary delta)。这段静默期超过隧道 / LB / NAT 的 idle 超时 → 连接被静默 FIN →s.onAbort触发记client-abort→ 掐掉上游。代码里早有相关注释(codex-event-extractor.ts:gpt-5.x 长 reasoning 会撞上游 duration cap),本 PR 处理的是更前置的“中间层 idle 超时”。改动
response-processor.ts的streamResponse(两条流式路径——账号池streaming-handler.ts+ 直连direct-request-handler.ts——的唯一汇聚点):HEARTBEAT_INTERVAL_MS,默认 15s,远低于常见 30-60s idle 超时)写一行 SSE 注释: ping\n\n保活finally停表;heartbeatMs=0可关闭;定时器unref()不阻塞事件循环X-Accel-Buffering: no,防 nginx 类反代缓冲住心跳/增量测试
新增
tests/unit/stream-heartbeat.test.ts(5 例):heartbeatMs=0关闭验证:
npx tsc --noEmit→ exit 0npm test→ 241 files / 2384 passed, 1 skipped待真实环境确认(E2E)
单测证明心跳机制本身正确,但真实隧道 + 大上下文 + 长 reasoning 的连续复现需在上报者环境确认:挂 ngrok/cloudflared,用之前卡死的 ~50K 窗口连续跑 ≥3 次观察是否还断;本地可
curl -N观察长 reasoning 请求的 SSE 流是否周期性出现: ping。