feat(voice): expose item_id on UserInputTranscribedEvent (closes #6109) by tsushanth · Pull Request #6127 · livekit/agents

tsushanth · 2026-06-16T18:36:02Z

Why

Closes #6109.

`AgentSession`'s `user_input_transcribed` event fires once per interim transcript on realtime models — every streamed delta produces a new `is_final=False` event, so a single user utterance produces many events with no stable correlation key. The internal `llm.InputTranscriptionCompleted` already carries an `item_id` that uniquely identifies the utterance, but when `AgentActivity._on_input_audio_transcription_completed` re-emits it upward as `UserInputTranscribedEvent`, the id is dropped on the floor:

```python

livekit-agents/livekit/agents/voice/agent_activity.py — before

def _on_input_audio_transcription_completed(self, ev: llm.InputTranscriptionCompleted) -> None:
self._session._user_input_transcribed(
UserInputTranscribedEvent(transcript=ev.transcript, is_final=ev.is_final)
) # ev.item_id is dropped
```

Consequence: consumers that need per-utterance dedup — e.g. "notify the frontend 'user speech received' so it renders a placeholder exactly once" — must either keep manual `last_item_id` state the event can't actually provide, or bypass the provider-agnostic event entirely and read `item_id` from `openai_server_event_received`. That escape hatch isn't portable: Gemini's realtime plugin doesn't emit `openai_server_event_received` at all.

Fix

Add `item_id: str | None = None` to `UserInputTranscribedEvent` and thread it through the realtime emission site:

```python

livekit-agents/livekit/agents/voice/events.py — added field

class UserInputTranscribedEvent(BaseModel):
...
item_id: str | None = None
"""Stable id identifying the user utterance this transcript belongs to. On
realtime models, every interim and final UserInputTranscribedEvent for a
single utterance shares the same item_id, so consumers can dedup interim
transcripts and react exactly once per utterance using the provider-agnostic
event surface. None on STT paths where no upstream item id exists."""
```

```python

livekit-agents/livekit/agents/voice/agent_activity.py — threaded through

def _on_input_audio_transcription_completed(self, ev: llm.InputTranscriptionCompleted) -> None:
self._session._user_input_transcribed(
UserInputTranscribedEvent(
transcript=ev.transcript, is_final=ev.is_final, item_id=ev.item_id
)
)
```

STT paths (`on_interim_transcript` / `on_final_transcript` in the same file) leave `item_id` at the default `None` because the STT layer has no corresponding upstream id concept. Existing subscribers reading `transcript` / `is_final` / `language` / `speaker_id` / `created_at` see no behavioral change.

Test

New `tests/test_user_input_transcribed_event.py` (4 unit tests):

`test_user_input_transcribed_event_carries_item_id` — schema accepts the field
`test_user_input_transcribed_event_item_id_defaults_to_none` — backwards-compat for STT paths
`test_user_input_transcribed_event_serialises_item_id` — `model_dump` includes the field (relevant for the cross-process host transport already exercised in `test_session_host.py`)
`test_input_transcription_completed_item_id_can_thread_to_event` — pins the realtime data flow without instantiating the full `AgentActivity` (Pydantic schema contract is the boundary under test)

All four fail on unpatched source because Pydantic rejects the unknown `item_id` kwarg.

```
$ uv run pytest tests/test_user_input_transcribed_event.py --unit -v
PASSED tests/test_user_input_transcribed_event.py::test_user_input_transcribed_event_carries_item_id
PASSED tests/test_user_input_transcribed_event.py::test_user_input_transcribed_event_item_id_defaults_to_none
PASSED tests/test_user_input_transcribed_event.py::test_user_input_transcribed_event_serialises_item_id
PASSED tests/test_user_input_transcribed_event.py::test_input_transcription_completed_item_id_can_thread_to_event
4 passed in 0.05s
```

Backwards compatibility

Strict additive change. `item_id` is optional and defaults to `None`, so:

Existing call sites that construct `UserInputTranscribedEvent` without `item_id` continue to work (verified in the test where the STT-style construction without `item_id` round-trips with `item_id=None`)
Existing subscribers that don't read `item_id` see no behavioral change
The new field surfaces via the existing event surface, so no new event-name registration is needed downstream

…kit#6109) `AgentSession`'s `user_input_transcribed` event fires once per interim transcript on realtime models (every streamed delta produces a new `is_final=False` event), so a single user utterance produces many events. The internal `llm.InputTranscriptionCompleted` already carries an `item_id` that uniquely identifies the utterance, but when `AgentActivity._on_input_audio_transcription_completed` re-emits upward as `UserInputTranscribedEvent`, the id is dropped: self._session._user_input_transcribed( UserInputTranscribedEvent(transcript=ev.transcript, is_final=ev.is_final) ) # ev.item_id is dropped Consequence today: consumers that need to react exactly once per utterance — e.g. notify the frontend "user speech received" so it renders a placeholder before the agent responds — must either keep manual `last_item_id` state the event can't actually provide, or bypass the provider-agnostic event entirely and read item_id from the raw provider stream (e.g. `openai_server_event_received`). The raw-event escape hatch isn't portable — Gemini's realtime plugin doesn't emit `openai_server_event_received` at all, so the same consumer code can't work across realtime backends. This commit adds `item_id: str | None = None` to `UserInputTranscribedEvent` and threads it through the realtime emission site. STT paths leave it at the default `None` because there's no corresponding upstream item id there. Pydantic's optional default keeps the field fully backwards-compatible: existing event subscribers reading `transcript` / `is_final` / `language` / `speaker_id` / `created_at` see no change. - livekit-agents/livekit/agents/voice/events.py: add `item_id` field with a docstring documenting the realtime-stable / STT-none semantics - livekit-agents/livekit/agents/voice/agent_activity.py: thread `ev.item_id` through `_on_input_audio_transcription_completed` - tests/test_user_input_transcribed_event.py: new unit test module pinning (1) the field round-trips on the schema, (2) it defaults to None on omission, (3) it survives `model_dump` (relevant for the cross-process transport in `test_session_host.py`), and (4) the realtime-path data flow can thread `InputTranscriptionCompleted.item_id` through without modification. All four fail on unpatched source because Pydantic rejects the unknown `item_id` kwarg.

devin-ai-integration

Devin Review found 1 potential issue.

devin-ai-integration · 2026-06-16T18:38:03Z

+            UserInputTranscribedEvent(
+                transcript=ev.transcript, is_final=ev.is_final, item_id=ev.item_id
+            )
        )


🚩 Remote session transport does not forward item_id

The _on_user_input_transcribed handler in livekit-agents/livekit/agents/voice/remote_session.py:466-474 constructs the protobuf UserInputTranscribed message with only transcript and is_final — the new item_id field is not forwarded. This means remote session consumers won't receive the item_id for dedup purposes. This is not a regression (the protobuf schema would need a separate update), but it limits the utility of the feature for remote/distributed deployments.

Was this helpful? React with 👍 or 👎 to provide feedback.

tsushanth requested a review from a team as a code owner June 16, 2026 18:36

devin-ai-integration Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): expose item_id on UserInputTranscribedEvent (closes #6109)#6127

feat(voice): expose item_id on UserInputTranscribedEvent (closes #6109)#6127
tsushanth wants to merge 1 commit into
livekit:mainfrom
tsushanth:fix/user-input-transcribed-event-item-id

tsushanth commented Jun 16, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tsushanth commented Jun 16, 2026

Why

livekit-agents/livekit/agents/voice/agent_activity.py — before

Fix

livekit-agents/livekit/agents/voice/events.py — added field

livekit-agents/livekit/agents/voice/agent_activity.py — threaded through

Test

Backwards compatibility

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant