Bug Description
AgentSession's user_input_transcribed event fires once per interim transcript. For realtime models (OpenAI/xAI), the input-transcription delta handler emits a new input_audio_transcription_completed(is_final=False) for every streamed delta chunk, so a single user utterance produces many user_input_transcribed events with is_final=False.
A common need is to react exactly once per utterance — e.g. notify the frontend "user speech received" so it can render a placeholder before the agent responds. To do that you need a stable key to correlate all the interim events of the same utterance.
That key already exists internally: livekit.agents.llm.InputTranscriptionCompleted carries item_id (and confidence). But when AgentActivity re-emits it upward, the id is discarded:
# livekit/agents/voice/agent_activity.py
def _on_input_audio_transcription_completed(self, ev: llm.InputTranscriptionCompleted) -> None:
self._session._user_input_transcribed(
UserInputTranscribedEvent(transcript=ev.transcript, is_final=ev.is_final)
) # ev.item_id is dropped
UserInputTranscribedEvent itself only has transcript, is_final, speaker_id, language, created_at (livekit/agents/voice/events.py) — no item_id.
Consequence: to dedup per utterance, a consumer must either (a) keep manual last_item_id state that the event can't actually provide, or (b) bypass the provider-agnostic event entirely and subscribe to the raw openai_server_event_received to read item_id from the raw dict. Option (b) is not portable — Gemini's realtime plugin doesn't emit openai_server_event_received at all, so the same code can't work across providers
### Expected Behavior
UserInputTranscribedEvent should expose the item_id that is already present on llm.InputTranscriptionCompleted, so consumers can correlate all interim/final transcripts of a single utterance and act exactly once per utterance — using the provider-agnostic user_input_transcribed event, without dropping down to provider-specific raw events.
Concretely:
Add item_id: str | None to UserInputTranscribedEvent and pass ev.item_id through in _on_input_audio_transcription_completed.
(Optional) include a stable timestamp marking when this item/utterance first started, so consumers can dedup/order without tracking state themselves.
This keeps a single, provider-agnostic subscription point that works uniformly across OpenAI, xAI, and Gemini.
### Reproduction Steps
```bash
1.Start an AgentSession with any realtime model (e.g. OpenAI/xAI gpt-realtime).
2.session.on("user_input_transcribed", handler).
3.Speak one sentence.
4.Observe handler fires many times with is_final=False during the single utterance, and the event provides no item_id to group them.
Operating System
macOS
Models Used
Realtime xai realtime , openai gpt-realtime , gemini
Package Versions
livekit-agents 1.5.17
livekit (rtc) 1.1.8
livekit-plugins-openai 1.5.17
Python 3.13, macOS
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
No response
Screenshots and Recordings
No response
Bug Description
AgentSession'suser_input_transcribedevent fires once per interim transcript. For realtime models (OpenAI/xAI), the input-transcription delta handler emits a newinput_audio_transcription_completed(is_final=False)for every streamed delta chunk, so a single user utterance produces manyuser_input_transcribedevents withis_final=False.A common need is to react exactly once per utterance — e.g. notify the frontend "user speech received" so it can render a placeholder before the agent responds. To do that you need a stable key to correlate all the interim events of the same utterance.
That key already exists internally:
livekit.agents.llm.InputTranscriptionCompletedcarriesitem_id(andconfidence). But whenAgentActivityre-emits it upward, the id is discarded:Operating System
macOS
Models Used
Realtime xai realtime , openai gpt-realtime , gemini
Package Versions
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
No response
Screenshots and Recordings
No response