feat(plugins): add FunASR self-hosted STT plugin by LauraGPT · Pull Request #6129 · livekit/agents

LauraGPT · 2026-06-16T19:06:33Z

Adds livekit-plugins-funasr — a non-streaming, self-hosted STT plugin backed by FunASR (SenseVoice / Paraformer / Fun-ASR-Nano). Runs fully locally, no cloud API; strong on Chinese + 50+ languages.

Design

Implements STT._recognize_impl(buffer) → combine frames → FunASR AutoModel.generate → SpeechEvent(FINAL_TRANSCRIPT).
Declares STTCapabilities(streaming=False), so LiveKit wraps it with a VAD StreamAdapter for real-time agents (same pattern as other non-streaming plugins).
Lazy model load; FunASR runs in an executor (non-blocking).

Tested

On an H100: STT(model='FunAudioLLM/SenseVoiceSmall', hub='hf', device='cuda') transcribes a Chinese clip and returns a FINAL_TRANSCRIPT event with the correct text. Package imports + registers cleanly.

Usage

from livekit.plugins import funasr
stt = funasr.STT(model='iic/SenseVoiceSmall', device='cuda')          # ModelScope
stt = funasr.STT(model='FunAudioLLM/SenseVoiceSmall', hub='hf', device='cuda')  # HuggingFace

Resolves #5897. Happy to add CHANGELOG / CI wiring per your conventions — let me know what's needed.

Adds `livekit-plugins-funasr`: a non-streaming STT plugin backed by [FunASR](https://github.com/modelscope/FunASR) (SenseVoice / Paraformer / Fun-ASR-Nano), running fully locally with no cloud API. Strong on Chinese and 50+ languages; SenseVoice also returns language/emotion/event tags. Implements `STT._recognize_impl` (combine frames -> FunASR -> SpeechEvent) and declares `STTCapabilities(streaming=False)`, so LiveKit wraps it with a VAD StreamAdapter for real-time agents. Tested: transcribes a Chinese clip via the STT interface and returns a FINAL_TRANSCRIPT event. Resolves livekit#5897.

CLAassistant · 2026-06-16T19:06:47Z

All committers have signed the CLA.

devin-ai-integration

Devin Review found 3 potential issues.

devin-ai-integration · 2026-06-16T19:10:06Z

+    def _ensure_model(self):
+        if self._model is None:
+            from funasr import AutoModel
+
+            kwargs = dict(model=self._opts.model, device=self._opts.device, hub=self._opts.hub, disable_update=True)
+            if self._vad_model:
+                kwargs.update(vad_model=self._vad_model, vad_kwargs={"max_single_segment_time": 30000})
+            logger.info("loading FunASR model %s on %s", self._opts.model, self._opts.device)
+            self._model = AutoModel(**kwargs)
+        return self._model


🔴 No thread-safety for lazy model init and concurrent inference in thread executor

_ensure_model() is called from _run() which executes in a thread pool via run_in_executor (stt.py:98). There is no lock guarding the check-then-set on self._model (stt.py:58), so concurrent _recognize_impl calls can race: two threads both see self._model is None, both load the model (wasting resources and time), and one loaded instance is silently discarded. More critically, after the model is initialized, concurrent _run invocations will call model.generate() simultaneously on the same FunASR/PyTorch model instance. PyTorch models are not thread-safe for forward passes (they share internal buffers), which can cause crashes (especially on CUDA) or silently produce incorrect transcription results.

Prompt for agents

The _ensure_model method and subsequent model.generate() call in _run are not protected by any lock, yet they run in a thread pool executor where concurrent execution is possible. Add a threading.Lock to the STT class (initialized in __init__) and acquire it in the _run function (or at minimum around _ensure_model and model.generate). This ensures: (1) the model is loaded exactly once, and (2) inference calls are serialized to avoid PyTorch thread-safety issues. For example, in __init__ add self._lock = threading.Lock(), then in _run wrap the body with 'with self._lock:'. Alternatively, separate initialization locking (which only needs to guard _ensure_model) from inference locking (which guards model.generate).

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-16T19:10:08Z

+        try:
+            text = await asyncio.get_event_loop().run_in_executor(None, _run)
+        except Exception as e:  # noqa: BLE001
+            raise APIConnectionError() from e


🚩 Broad exception catch masks local errors as retriable APIConnectionError

At stt.py:99-100, all exceptions (including KeyError, ValueError, CUDA OOM, etc.) are caught and re-raised as APIConnectionError. The base class recognize() method at livekit-agents/livekit/agents/stt/stt.py:204-248 retries on APIError (parent of APIConnectionError). This means local model inference errors—which are not transient and won't resolve on retry—will be retried up to max_retry times, wasting time and obscuring the real error. While some other plugins follow a similar pattern, those are wrapping actual network calls where retries make sense. For a local model, a more targeted exception filter (e.g., only catching FunASR-specific errors) would be more appropriate. Not flagged as a bug because this pattern exists in other plugins, but it's worth noting.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-16T19:10:09Z

+class STT(stt.STT):
+    """FunASR self-hosted speech-to-text.
+
+    Runs FunASR models (SenseVoice, Paraformer, Fun-ASR-Nano) locally — no cloud
+    API. Non-streaming; LiveKit wraps it with a VAD StreamAdapter for agents.
+    """
+
+    def __init__(
+        self,
+        *,
+        model: str = _DEFAULT_MODEL,
+        language: str = "auto",
+        device: str = "cpu",
+        hub: str = "ms",
+        use_itn: bool = True,
+        vad_model: str | None = "fsmn-vad",
+    ) -> None:
+        super().__init__(capabilities=STTCapabilities(streaming=False, interim_results=False))
+        self._opts = _STTOptions(model=model, language=language, device=device, hub=hub, use_itn=use_itn)
+        self._vad_model = vad_model
+        self._model = None


🚩 Missing model and provider property overrides

The base STT class at livekit-agents/livekit/agents/stt/stt.py:161-182 defines model and provider properties that return "unknown" by default, with docstrings explicitly stating plugins should override them. Other STT plugins (deepgram at stt.py:199-203, openai at stt.py:199-203, fal at stt.py:47-52) all override these properties. This FunASR plugin does not, meaning metrics emitted by the base class (at stt.py:220-221) will report model_name="unknown" and model_provider="unknown", reducing observability. This is not a correctness bug but an incomplete integration.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 2 new potential issues.

devin-ai-integration · 2026-06-17T10:15:25Z

+        except Exception as e:  # noqa: BLE001
+            raise APIConnectionError() from e


🟡 Blanket Exception catch wraps non-transient local errors as APIConnectionError, causing futile retries

At lines 99-100, every exception (including KeyError, RuntimeError, torch.cuda.OutOfMemoryError, model-loading failures, etc.) is caught and re-raised as APIConnectionError. The base class recognize() method (livekit-agents/livekit/agents/stt/stt.py:204-248) catches APIError (parent of APIConnectionError) and retries up to conn_options.max_retry times (default 3). Since this is a local model with no network involved, none of these errors are transient connection problems — retrying an OOM or a model-loading failure 3 times is wasteful and delays the real error from surfacing. Other plugins (e.g., livekit-plugins-fal/livekit/plugins/fal/stt.py:84) catch only the provider-specific exception class.

Prompt for agents

The broad except Exception clause at line 99 catches all errors and wraps them as APIConnectionError, which causes the base class to retry them. For a local inference model, most errors (OOM, model load failure, bad audio format) are non-transient and should not be retried. Consider either: (1) narrowing the catch to only FunASR-specific exceptions that could be transient, or (2) re-raising non-transient errors directly without wrapping in APIConnectionError. You may want to import specific exception types from funasr if available, and let programming errors like KeyError/TypeError propagate naturally.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-17T10:15:26Z

+        return stt.SpeechEvent(
+            type=SpeechEventType.FINAL_TRANSCRIPT,
+            alternatives=[stt.SpeechData(text=text, language=str(lang))],
+        )


🚩 Language "auto" is passed to SenseVoice model — valid but semantically lossy in response

When using the default SenseVoice model with default language "auto", line 91's condition "SenseVoice" in self._opts.model is always true, so gen_kwargs["language"] = "auto" is always set. SenseVoice supports this (it auto-detects language). However, the response at line 104 reports the language as "auto" in SpeechData.language, rather than the actually-detected language. FunASR's result object may contain detected language info that could be extracted. This isn't incorrect (the fal plugin also passes through the configured language), but it means downstream consumers can't know what language was actually spoken.

Was this helpful? React with 👍 or 👎 to provide feedback.

LauraGPT requested a review from a team as a code owner June 16, 2026 19:06

devin-ai-integration Bot reviewed Jun 16, 2026

View reviewed changes

LauraGPT force-pushed the feat/funasr-stt-plugin branch from 831964d to b68cd98 Compare June 17, 2026 10:10

devin-ai-integration Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugins): add FunASR self-hosted STT plugin#6129

feat(plugins): add FunASR self-hosted STT plugin#6129
LauraGPT wants to merge 1 commit into
livekit:mainfrom
LauraGPT:feat/funasr-stt-plugin

LauraGPT commented Jun 16, 2026

Uh oh!

CLAassistant commented Jun 16, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		except Exception as e: # noqa: BLE001
		raise APIConnectionError() from e

Conversation

LauraGPT commented Jun 16, 2026

Design

Tested

Usage

Uh oh!

CLAassistant commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Jun 16, 2026 •

edited

Loading