Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 159 additions & 7 deletions livekit-plugins/livekit-plugins-resemble/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Resemble plugin for LiveKit Agents

Support for voice synthesis with the [Resemble AI](https://www.resemble.ai/) API, using both their REST API and WebSocket streaming interface.
Support for [Resemble AI](https://www.resemble.ai/) voice synthesis, real-time
deepfake detection, and fraud/scam-intent scoring in LiveKit Agents.

See [https://docs.livekit.io/agents/integrations/tts/resemble/](https://docs.livekit.io/agents/integrations/tts/resemble/) for more information.

Expand All @@ -12,13 +13,14 @@ pip install livekit-plugins-resemble

## Pre-requisites

You'll need an API key from Resemble AI. It can be set as an environment variable: `RESEMBLE_API_KEY`
You'll need an API key from Resemble AI. It can be set as an environment variable:
`RESEMBLE_API_KEY`.

Additionally, you'll need the voice UUID from your Resemble AI account.
For TTS, you'll also need the voice UUID from your Resemble AI account.

## Examples

### Recommended
### Text-to-speech

```python
import asyncio
Expand All @@ -31,8 +33,7 @@ async def run_tts_example():
voice_uuid="your_voice_uuid",
# Optional parameters
sample_rate=44100, # Sample rate in Hz (default: 44100)
precision="PCM_16", # Audio precision (PCM_32, PCM_24, PCM_16, MULAW)
output_format="wav", # Output format (wav or mp3)
model="chatterbox-turbo",
) as tts:
# One-off synthesis (uses REST API)
audio_stream = tts.synthesize("Hello, world!")
Expand All @@ -59,6 +60,157 @@ async def run_tts_example():
asyncio.run(run_tts_example())
```

### Deepfake detection

Add Resemble Detect to a LiveKit room with a small, event-driven surface:

```python
from livekit.plugins import resemble

detect = resemble.ResembleDetect(security="standard")
detect.attach(ctx.room) # auto-subscribes to the first remote microphone track


def on_synthetic(result: resemble.DetectionResult) -> None:
# result.label is the raw Detect label ("fake"); normalized_label is app-facing.
if result.normalized_label == "synthetic":
# pause account actions, ask for step-up verification, or escalate
...


detect.on("synthetic_detected", on_synthetic)
```

Each result exposes a stable app payload:

```python
{
"label": "synthetic",
"score": 0.86,
"confidence": 0.9,
"window_ts": 41.2,
"scan_index": 3,
"is_final": False,
}
```

Use `result.to_dict()` or `verdict.to_dict()` if you want this shape directly.

### Fraud and scam-intent scoring

Use Resemble Signal when you want to score text, audio, video, or image content against
fraud and scam categories. It is a companion to Detect: Detect verifies media
authenticity, while Signal identifies suspicious intent.

```python
from livekit.plugins import resemble

signal = resemble.ResembleSignal()

result = await signal.score_text(
"Hi, it's me. Please read me the reset code you just received."
)

if result.verdict == "fraud":
# block the action, step up verification, or escalate to a human reviewer
print(result.top_category.name if result.top_category else "fraud")
```

Each Signal result exposes a stable app payload:

```python
{
"verdict": "fraud",
"score": 0.91,
"recommended_action": "block",
"input_modality": "text",
"top_category": {
"name": "CEO Impersonation / Wire Diversion",
"score": 0.91,
"icon": "wire",
},
}
```

Signal can also score media files and manage custom fraud categories:

```python
await signal.score_file(
wav_bytes,
filename="call.wav",
media_type="audio",
content_type="audio/wav",
)

await signal.create_custom_category(
name="Tech Support Scam",
scenarios=[
"Your computer has a virus, press 1 to continue",
"We detected suspicious activity, verify your identity now",
],
)
```

Signal uses the same bearer authorization style as the rest of this plugin. If you need to
override it, pass the full `Bearer ...` value as `api_key`.

### Detection options

1. **Standard security** - default for most calls. Checks a 4s speech window early, samples
across the call, and emits `synthetic_detected` only after 2-of-3 recent checks agree.

```python
detect = resemble.ResembleDetect(security="standard")
```

2. **Spot check** - lowest cost. Runs one check once enough speech is available.

```python
detect = resemble.ResembleDetect(security="spot")
```

3. **High security** - continuous monitoring for sensitive workflows.

```python
detect = resemble.ResembleDetect(security="high")
```

4. **Custom policy** - override any preset with simple keyword arguments.

```python
detect = resemble.ResembleDetect(
security="standard",
window_seconds=4.0,
sample_interval_seconds=20.0,
fake_threshold=0.75,
agreement_window=3,
min_fake_results=2,
zero_retention_mode=True,
extra_form_fields={"use_ood_detector": True},
)
```

5. **Custom transport** - keep the LiveKit integration logic but swap how audio reaches
Detect. This is useful for a streaming Detect backend, a gateway, or tests.

```python
detect = resemble.ResembleDetect(transport=my_detect_transport)
```

For a sensitive action such as a password reset, request a fresh check before proceeding:

```python
detect.check_now()
```

`fake_detected` is still emitted for every raw window that crosses `fake_threshold`.
Production agents should usually act on `synthetic_detected`, which applies the configured
agreement policy.

The default REST transport uploads short WAV windows directly to Detect with `Prefer: wait`
and `zero_retention_mode=True`. Pass `extra_form_fields` for advanced Detect options, or pass
a custom `transport` when using a streaming Detect backend or gateway.

### Alternative: Manual Resource Management

If you prefer to manage resources manually, make sure to properly clean up:
Expand Down Expand Up @@ -117,4 +269,4 @@ This plugin uses two different approaches to generate speech:
1. **One-off Synthesis** - Uses Resemble's REST API for simple text-to-speech conversion
2. **Streaming Synthesis** - Uses Resemble's WebSocket API for real-time streaming synthesis

The WebSocket streaming API is only available for Resemble AI Business plan users.
The WebSocket streaming API is only available for Resemble AI Business plan users.
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,53 @@
See https://docs.livekit.io/agents/integrations/tts/resemble/ for more information.
"""

from .detect import (
DetectionAction,
DetectionMonitor,
DetectionResult,
DetectionSecurity,
DetectionVerdict,
DetectTransport,
ResembleDetect,
RestDetectTransport,
)
from .models import TTSModels
from .signal import (
ResembleSignal,
RestSignalTransport,
SignalAction,
SignalCategoryScore,
SignalModality,
SignalResult,
SignalTransport,
SignalVerdict,
)
from .tts import TTS, ChunkedStream, SynthesizeStream
from .version import __version__

__all__ = ["TTS", "TTSModels", "ChunkedStream", "SynthesizeStream", "__version__"]
__all__ = [
"TTS",
"TTSModels",
"ChunkedStream",
"SynthesizeStream",
"ResembleDetect",
"DetectionMonitor",
"DetectionResult",
"DetectionVerdict",
"DetectionAction",
"DetectionSecurity",
"DetectTransport",
"RestDetectTransport",
"ResembleSignal",
"SignalResult",
"SignalCategoryScore",
"SignalVerdict",
"SignalModality",
"SignalAction",
"SignalTransport",
"RestSignalTransport",
"__version__",
]

from livekit.agents import Plugin

Expand Down
Loading