Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions livekit-plugins/livekit-plugins-funasr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# LiveKit Plugins FunASR

Self-hosted speech-to-text for LiveKit Agents using [FunASR](https://github.com/modelscope/FunASR) — SenseVoice, Paraformer, Fun-ASR-Nano. Runs **locally, no cloud API**, strong on Chinese and 50+ languages.

## Install
```bash
pip install livekit-plugins-funasr
```

## Usage
```python
from livekit.plugins import funasr

# ModelScope (default hub="ms")
stt = funasr.STT(model="iic/SenseVoiceSmall", device="cuda")

# HuggingFace
stt = funasr.STT(model="FunAudioLLM/SenseVoiceSmall", hub="hf", device="cuda")
```

Non-streaming STT; LiveKit wraps it with a VAD `StreamAdapter` for real-time agents.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""FunASR plugin for LiveKit Agents — self-hosted speech-to-text (SenseVoice / Paraformer / Fun-ASR-Nano)."""
from livekit.agents import Plugin
from .log import logger
from .stt import STT
from .version import __version__

Check failure on line 5 in livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (I001)

livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/__init__.py:2:1: I001 Import block is un-sorted or un-formatted help: Organize imports

__all__ = ["STT", "__version__"]


class FunASRPlugin(Plugin):
def __init__(self) -> None:
super().__init__(__name__, __version__, __package__, logger)


Plugin.register_plugin(FunASRPlugin())
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
import logging

Check failure on line 1 in livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/log.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (I001)

livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/log.py:1:1: I001 Import block is un-sorted or un-formatted help: Organize imports
logger = logging.getLogger("livekit.plugins.funasr")
Empty file.
105 changes: 105 additions & 0 deletions livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
from __future__ import annotations

import asyncio
import io
from dataclasses import dataclass

import numpy as np

Check failure on line 7 in livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (F401)

livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py:7:17: F401 `numpy` imported but unused help: Remove unused import: `numpy`

from livekit import rtc
from livekit.agents import (
DEFAULT_API_CONNECT_OPTIONS,
APIConnectionError,
APIConnectOptions,
stt,
)
from livekit.agents.stt import SpeechEventType, STTCapabilities
from livekit.agents.types import NOT_GIVEN, NotGivenOr
from livekit.agents.utils import AudioBuffer, is_given

from .log import logger

_DEFAULT_MODEL = "iic/SenseVoiceSmall"
_TARGET_SR = 16000


@dataclass
class _STTOptions:
model: str = _DEFAULT_MODEL
language: str = "auto"
device: str = "cpu"
hub: str = "ms"
use_itn: bool = True


class STT(stt.STT):
"""FunASR self-hosted speech-to-text.

Runs FunASR models (SenseVoice, Paraformer, Fun-ASR-Nano) locally — no cloud
API. Non-streaming; LiveKit wraps it with a VAD StreamAdapter for agents.
"""

def __init__(
self,
*,
model: str = _DEFAULT_MODEL,
language: str = "auto",
device: str = "cpu",
hub: str = "ms",
use_itn: bool = True,
vad_model: str | None = "fsmn-vad",
) -> None:
super().__init__(capabilities=STTCapabilities(streaming=False, interim_results=False))
self._opts = _STTOptions(model=model, language=language, device=device, hub=hub, use_itn=use_itn)
self._vad_model = vad_model
self._model = None
Comment on lines +35 to +55

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Missing model and provider property overrides

The base STT class at livekit-agents/livekit/agents/stt/stt.py:161-182 defines model and provider properties that return "unknown" by default, with docstrings explicitly stating plugins should override them. Other STT plugins (deepgram at stt.py:199-203, openai at stt.py:199-203, fal at stt.py:47-52) all override these properties. This FunASR plugin does not, meaning metrics emitted by the base class (at stt.py:220-221) will report model_name="unknown" and model_provider="unknown", reducing observability. This is not a correctness bug but an incomplete integration.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


def _ensure_model(self):
if self._model is None:
from funasr import AutoModel

kwargs = dict(model=self._opts.model, device=self._opts.device, hub=self._opts.hub, disable_update=True)

Check failure on line 61 in livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (C408)

livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py:61:22: C408 Unnecessary `dict()` call (rewrite as a literal) help: Rewrite as a literal
if self._vad_model:
kwargs.update(vad_model=self._vad_model, vad_kwargs={"max_single_segment_time": 30000})
logger.info("loading FunASR model %s on %s", self._opts.model, self._opts.device)
self._model = AutoModel(**kwargs)
return self._model
Comment on lines +57 to +66

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 No thread-safety for lazy model init and concurrent inference in thread executor

_ensure_model() is called from _run() which executes in a thread pool via run_in_executor (stt.py:98). There is no lock guarding the check-then-set on self._model (stt.py:58), so concurrent _recognize_impl calls can race: two threads both see self._model is None, both load the model (wasting resources and time), and one loaded instance is silently discarded. More critically, after the model is initialized, concurrent _run invocations will call model.generate() simultaneously on the same FunASR/PyTorch model instance. PyTorch models are not thread-safe for forward passes (they share internal buffers), which can cause crashes (especially on CUDA) or silently produce incorrect transcription results.

Prompt for agents
The _ensure_model method and subsequent model.generate() call in _run are not protected by any lock, yet they run in a thread pool executor where concurrent execution is possible. Add a threading.Lock to the STT class (initialized in __init__) and acquire it in the _run function (or at minimum around _ensure_model and model.generate). This ensures: (1) the model is loaded exactly once, and (2) inference calls are serialized to avoid PyTorch thread-safety issues. For example, in __init__ add self._lock = threading.Lock(), then in _run wrap the body with 'with self._lock:'. Alternatively, separate initialization locking (which only needs to guard _ensure_model) from inference locking (which guards model.generate).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


async def _recognize_impl(
self,
buffer: AudioBuffer,
*,
language: NotGivenOr[str] = NOT_GIVEN,
conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS,
) -> stt.SpeechEvent:
lang = language if is_given(language) else self._opts.language
wav_bytes = rtc.combine_audio_frames(buffer).to_wav_bytes()

def _run() -> str:
import soundfile as sf
from funasr.utils.postprocess_utils import rich_transcription_postprocess

Check failure on line 80 in livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (I001)

livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py:79:13: I001 Import block is un-sorted or un-formatted help: Organize imports

model = self._ensure_model()
audio, sr = sf.read(io.BytesIO(wav_bytes), dtype="float32")
if audio.ndim > 1:
audio = audio.mean(axis=1)
if sr != _TARGET_SR:
import librosa

audio = librosa.resample(audio, orig_sr=sr, target_sr=_TARGET_SR)
gen_kwargs = dict(input=audio, cache={}, use_itn=self._opts.use_itn, batch_size_s=300)

Check failure on line 90 in livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py

View workflow job for this annotation

GitHub Actions / ruff

ruff (C408)

livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py:90:26: C408 Unnecessary `dict()` call (rewrite as a literal) help: Rewrite as a literal
if "SenseVoice" in self._opts.model or (lang and lang != "auto"):
gen_kwargs["language"] = lang
res = model.generate(**gen_kwargs)
text = res[0]["text"] if res else ""
return rich_transcription_postprocess(text)

try:
text = await asyncio.get_event_loop().run_in_executor(None, _run)
except Exception as e: # noqa: BLE001
raise APIConnectionError() from e
Comment on lines +97 to +100

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Broad exception catch masks local errors as retriable APIConnectionError

At stt.py:99-100, all exceptions (including KeyError, ValueError, CUDA OOM, etc.) are caught and re-raised as APIConnectionError. The base class recognize() method at livekit-agents/livekit/agents/stt/stt.py:204-248 retries on APIError (parent of APIConnectionError). This means local model inference errors—which are not transient and won't resolve on retry—will be retried up to max_retry times, wasting time and obscuring the real error. While some other plugins follow a similar pattern, those are wrapping actual network calls where retries make sense. For a local model, a more targeted exception filter (e.g., only catching FunASR-specific errors) would be more appropriate. Not flagged as a bug because this pattern exists in other plugins, but it's worth noting.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +99 to +100

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Blanket Exception catch wraps non-transient local errors as APIConnectionError, causing futile retries

At lines 99-100, every exception (including KeyError, RuntimeError, torch.cuda.OutOfMemoryError, model-loading failures, etc.) is caught and re-raised as APIConnectionError. The base class recognize() method (livekit-agents/livekit/agents/stt/stt.py:204-248) catches APIError (parent of APIConnectionError) and retries up to conn_options.max_retry times (default 3). Since this is a local model with no network involved, none of these errors are transient connection problems — retrying an OOM or a model-loading failure 3 times is wasteful and delays the real error from surfacing. Other plugins (e.g., livekit-plugins-fal/livekit/plugins/fal/stt.py:84) catch only the provider-specific exception class.

Prompt for agents
The broad except Exception clause at line 99 catches all errors and wraps them as APIConnectionError, which causes the base class to retry them. For a local inference model, most errors (OOM, model load failure, bad audio format) are non-transient and should not be retried. Consider either: (1) narrowing the catch to only FunASR-specific exceptions that could be transient, or (2) re-raising non-transient errors directly without wrapping in APIConnectionError. You may want to import specific exception types from funasr if available, and let programming errors like KeyError/TypeError propagate naturally.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


return stt.SpeechEvent(
type=SpeechEventType.FINAL_TRANSCRIPT,
alternatives=[stt.SpeechData(text=text, language=str(lang))],
)
Comment on lines +102 to +105

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Language "auto" is passed to SenseVoice model — valid but semantically lossy in response

When using the default SenseVoice model with default language "auto", line 91's condition "SenseVoice" in self._opts.model is always true, so gen_kwargs["language"] = "auto" is always set. SenseVoice supports this (it auto-detects language). However, the response at line 104 reports the language as "auto" in SpeechData.language, rather than the actually-detected language. FunASR's result object may contain detected language info that could be extracted. This isn't incorrect (the fal plugin also passes through the configured language), but it means downstream consumers can't know what language was actually spoken.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.1.0"
36 changes: 36 additions & 0 deletions livekit-plugins/livekit-plugins-funasr/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "livekit-plugins-funasr"
dynamic = ["version"]
description = "FunASR (SenseVoice / Paraformer / Fun-ASR-Nano) self-hosted STT plugin for LiveKit Agents"
readme = "README.md"
license = "Apache-2.0"
requires-python = ">=3.10.0"
authors = [{ name = "LiveKit", email = "hello@livekit.io" }]
keywords = ["voice", "ai", "realtime", "audio", "livekit", "funasr", "speech-to-text", "asr"]
classifiers = [
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Topic :: Multimedia :: Sound/Audio",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
]
dependencies = ["livekit-agents>=1.6.0", "funasr>=1.1.0", "soundfile", "librosa"]

[project.urls]
Documentation = "https://docs.livekit.io"
Website = "https://livekit.io/"
Source = "https://github.com/livekit/agents"

[tool.hatch.version]
path = "livekit/plugins/funasr/version.py"

[tool.hatch.build.targets.wheel]
packages = ["livekit"]

[tool.hatch.build.targets.sdist]
include = ["/livekit"]
Loading