Plugin / version
livekit-plugins-xai 1.6.0, livekit-agents 1.6.0, Python 3.13
Verified on the latest release: inspected the livekit-plugins-xai 1.6.0 wheel from PyPI (uploaded 2026-06-11) and the current main branch (tts.py) — the per-segment connect/close lifecycle in _run_ws is unchanged; no ConnectionPool, no prewarm() override. Behavior was originally observed in production on 1.5.17, which is code-identical.
Problem
SynthesizeStream._run_ws dials a fresh WebSocket for every synthesized segment (ws = await self._tts._connect_ws(...) … finally: await self._tts._close_ws(ws)). In an agent session this means every agent turn pays a full DNS+TLS+WS handshake to api.x.ai (~0.45–0.6 s) before any audio can arrive.
Observed in production metrics: TTSMetrics.connection_reused is false on 75/75 synthesis events; per-turn TTS node TTFB averaged ~0.65 s of which raw generation was only ~0.31 s — the rest is dominated by the per-turn handshake.
The xAI endpoint itself supports multiple synthesis rounds over a single WebSocket (verified by sending a second text.delta/text.done cycle after audio.done on the same connection — it synthesizes normally), so this is purely a plugin limitation.
Comparison
The Cartesia and Deepgram TTS plugins already solve this with utils.ConnectionPool (+ prewarm()), which is also what populates connection_reused/acquire_time in TTSMetrics. The xai plugin has neither a pool nor a prewarm() override.
Expected behavior
Connection reuse across segments via utils.ConnectionPool (with prewarm()), like the Cartesia/Deepgram plugins. One caveat worth noting for the implementation: the xAI protocol has no cancel message, so a connection whose generation was interrupted mid-stream must be discarded rather than returned to the pool, otherwise stale audio can leak into the next synthesis.
Plugin / version
livekit-plugins-xai1.6.0,livekit-agents1.6.0, Python 3.13Verified on the latest release: inspected the
livekit-plugins-xai1.6.0 wheel from PyPI (uploaded 2026-06-11) and the currentmainbranch (tts.py) — the per-segment connect/close lifecycle in_run_wsis unchanged; noConnectionPool, noprewarm()override. Behavior was originally observed in production on 1.5.17, which is code-identical.Problem
SynthesizeStream._run_wsdials a fresh WebSocket for every synthesized segment (ws = await self._tts._connect_ws(...)…finally: await self._tts._close_ws(ws)). In an agent session this means every agent turn pays a full DNS+TLS+WS handshake toapi.x.ai(~0.45–0.6 s) before any audio can arrive.Observed in production metrics:
TTSMetrics.connection_reusedisfalseon 75/75 synthesis events; per-turn TTS node TTFB averaged ~0.65 s of which raw generation was only ~0.31 s — the rest is dominated by the per-turn handshake.The xAI endpoint itself supports multiple synthesis rounds over a single WebSocket (verified by sending a second
text.delta/text.donecycle afteraudio.doneon the same connection — it synthesizes normally), so this is purely a plugin limitation.Comparison
The Cartesia and Deepgram TTS plugins already solve this with
utils.ConnectionPool(+prewarm()), which is also what populatesconnection_reused/acquire_timeinTTSMetrics. The xai plugin has neither a pool nor aprewarm()override.Expected behavior
Connection reuse across segments via
utils.ConnectionPool(withprewarm()), like the Cartesia/Deepgram plugins. One caveat worth noting for the implementation: the xAI protocol has no cancel message, so a connection whose generation was interrupted mid-stream must be discarded rather than returned to the pool, otherwise stale audio can leak into the next synthesis.