diff --git a/api-reference/openapi.json b/api-reference/openapi.json
index 8d2659f..cb160f6 100644
--- a/api-reference/openapi.json
+++ b/api-reference/openapi.json
@@ -1345,11 +1345,7 @@
"type": "string"
},
"train_mode": {
- "default": "full",
- "enum": [
- "fast",
- "full"
- ],
+ "const": "fast",
"title": "Train Mode",
"type": "string"
},
@@ -1647,7 +1643,7 @@
"type": "string"
},
"train_mode": {
- "default": "full",
+ "default": "fast",
"enum": [
"fast",
"full"
@@ -4052,7 +4048,7 @@
"type": "string"
},
"train_mode": {
- "default": "full",
+ "default": "fast",
"enum": [
"fast",
"full"
diff --git a/developer-guide/core-features/emotions.mdx b/developer-guide/core-features/emotions.mdx
index c66f8b6..fae52ea 100644
--- a/developer-guide/core-features/emotions.mdx
+++ b/developer-guide/core-features/emotions.mdx
@@ -60,9 +60,17 @@ The S2 TTS models will interpret these markers and adjust the voice accordingly.
-### Tone Markers (5 expressions)
+## Sound & Delivery Markers
-Control volume and intensity:
+These markers aren't emotions — they shape *how* a line is delivered, add natural human sounds, or layer in ambient effects. Combine them with the emotion cues above.
+
+### Tone Markers (6 expressions)
+
+Control volume, intensity, and emphasis. Place `[emphasis]` right before the word or phrase you want to stress:
+
+```text
+This is [emphasis] really important.
+```
diff --git a/features/realtime-streaming.mdx b/features/realtime-streaming.mdx
index 2dd90bc..005c130 100644
--- a/features/realtime-streaming.mdx
+++ b/features/realtime-streaming.mdx
@@ -151,7 +151,7 @@ for chunk in client.tts.stream_websocket(script(), reference_id="YOUR_VOICE_ID")
Both streaming paths take a `latency` mode:
-- `latency="balanced"` (default) — lowest time-to-first-audio. Use it for voice agents and live LLM output.
+- `latency="balanced"` (Python SDK default) — lowest time-to-first-audio. Use it for voice agents and live LLM output.
- `latency="normal"` — slightly higher latency, best audio quality. Use it for narration where you can afford a beat.
```python
@@ -159,6 +159,18 @@ for chunk in client.tts.stream_websocket(llm_tokens(), latency="balanced"):
...
```
+
+ **Set `latency` explicitly for real-time use.** The Python SDK defaults to `balanced`, but the raw HTTP/WebSocket API defaults to `normal`, which is tuned for quality and noticeably increases time-to-first-audio — you may wait several seconds for the first chunk. If you call the API directly, or through a third-party integration such as the LiveKit plugin, pass `balanced` (or `low`) for interactive latency.
+
+
+The available modes differ slightly between the raw API and the SDK:
+
+| Mode | Raw HTTP/WebSocket API | Python SDK | Behavior |
+| ---------- | ---------------------- | ------------- | ---------------------------------------------- |
+| `low` | Supported | Not available | Lowest latency |
+| `balanced` | Supported | Default | Reduced latency — recommended for real-time |
+| `normal` | Default | Supported | Best quality, highest time-to-first-audio |
+
For finer control, pass a `TTSConfig` with chunk tuning. Smaller chunks emit audio sooner (lower latency); larger chunks give the model more context (smoother prosody):
```python
diff --git a/features/text-to-speech.mdx b/features/text-to-speech.mdx
index 0d9a104..e84c505 100644
--- a/features/text-to-speech.mdx
+++ b/features/text-to-speech.mdx
@@ -215,10 +215,15 @@ audio = client.tts.convert(
`latency` trades stability for speed; `chunk_length` controls how much text the engine batches before it starts generating.
-- `latency="balanced"` (default) — lower time-to-first-audio (~300ms). Good for interactive use.
+- `latency="balanced"` (Python SDK default) — lower time-to-first-audio (~300ms). Good for interactive use.
- `latency="normal"` — most stable output, at slightly higher latency.
+- `latency="low"` (raw API only) — lowest latency.
- `chunk_length` (`100`–`300`, default `200`) — smaller chunks start audio sooner; larger chunks are more efficient for long text.
+
+ The raw HTTP/WebSocket API defaults `latency` to `normal` (quality-tuned), while the Python SDK defaults to `balanced`. For real-time use over the raw API, set `latency` to `balanced` or `low` explicitly — see [Tune latency vs. quality](/features/realtime-streaming#tune-latency-vs-quality).
+
+
```python Python
from fishaudio.types import TTSConfig
diff --git a/snippets/emotion-list-tones-s2.mdx b/snippets/emotion-list-tones-s2.mdx
index e4d83d1..b979abe 100644
--- a/snippets/emotion-list-tones-s2.mdx
+++ b/snippets/emotion-list-tones-s2.mdx
@@ -5,3 +5,4 @@
| Screaming | `[screaming]` | Very loud, panicked | Emergencies, fear |
| Whispering | `[whispering]` | Very soft, secretive | Secrets, quiet scenes |
| Soft | `[soft tone]` | Gentle, quiet | Comfort, lullabies |
+| Emphasis | `[emphasis]` | Stress a word/phrase | Highlighting key words |