diff --git a/api-reference/openapi.json b/api-reference/openapi.json
index 8d2659f..cb160f6 100644
--- a/api-reference/openapi.json
+++ b/api-reference/openapi.json
@@ -1345,11 +1345,7 @@
                       "type": "string"
                     },
                     "train_mode": {
-                      "default": "full",
-                      "enum": [
-                        "fast",
-                        "full"
-                      ],
+                      "const": "fast",
                       "title": "Train Mode",
                       "type": "string"
                     },
@@ -1647,7 +1643,7 @@
                       "type": "string"
                     },
                     "train_mode": {
-                      "default": "full",
+                      "default": "fast",
                       "enum": [
                         "fast",
                         "full"
@@ -4052,7 +4048,7 @@
             "type": "string"
           },
           "train_mode": {
-            "default": "full",
+            "default": "fast",
             "enum": [
               "fast",
               "full"
diff --git a/developer-guide/core-features/emotions.mdx b/developer-guide/core-features/emotions.mdx
index c66f8b6..fae52ea 100644
--- a/developer-guide/core-features/emotions.mdx
+++ b/developer-guide/core-features/emotions.mdx
@@ -60,9 +60,17 @@ The S2 TTS models will interpret these markers and adjust the voice accordingly.
 
 <AdvancedEmotions />
 
-### Tone Markers (5 expressions)
+## Sound & Delivery Markers
 
-Control volume and intensity:
+These markers aren't emotions — they shape *how* a line is delivered, add natural human sounds, or layer in ambient effects. Combine them with the emotion cues above.
+
+### Tone Markers (6 expressions)
+
+Control volume, intensity, and emphasis. Place `[emphasis]` right before the word or phrase you want to stress:
+
+```text
+This is [emphasis] really important.
+```
 
 <ToneMarkers />
 
diff --git a/features/realtime-streaming.mdx b/features/realtime-streaming.mdx
index 2dd90bc..005c130 100644
--- a/features/realtime-streaming.mdx
+++ b/features/realtime-streaming.mdx
@@ -151,7 +151,7 @@ for chunk in client.tts.stream_websocket(script(), reference_id="YOUR_VOICE_ID")
 
 Both streaming paths take a `latency` mode:
 
-- `latency="balanced"` (default) — lowest time-to-first-audio. Use it for voice agents and live LLM output.
+- `latency="balanced"` (Python SDK default) — lowest time-to-first-audio. Use it for voice agents and live LLM output.
 - `latency="normal"` — slightly higher latency, best audio quality. Use it for narration where you can afford a beat.
 
 ```python
@@ -159,6 +159,18 @@ for chunk in client.tts.stream_websocket(llm_tokens(), latency="balanced"):
     ...
 ```
 
+<Warning>
+  **Set `latency` explicitly for real-time use.** The Python SDK defaults to `balanced`, but the raw HTTP/WebSocket API defaults to `normal`, which is tuned for quality and noticeably increases time-to-first-audio — you may wait several seconds for the first chunk. If you call the API directly, or through a third-party integration such as the LiveKit plugin, pass `balanced` (or `low`) for interactive latency.
+</Warning>
+
+The available modes differ slightly between the raw API and the SDK:
+
+| Mode       | Raw HTTP/WebSocket API | Python SDK    | Behavior                                       |
+| ---------- | ---------------------- | ------------- | ---------------------------------------------- |
+| `low`      | Supported              | Not available | Lowest latency                                 |
+| `balanced` | Supported              | Default       | Reduced latency — recommended for real-time    |
+| `normal`   | Default                | Supported     | Best quality, highest time-to-first-audio      |
+
 For finer control, pass a `TTSConfig` with chunk tuning. Smaller chunks emit audio sooner (lower latency); larger chunks give the model more context (smoother prosody):
 
 ```python
diff --git a/features/text-to-speech.mdx b/features/text-to-speech.mdx
index 0d9a104..e84c505 100644
--- a/features/text-to-speech.mdx
+++ b/features/text-to-speech.mdx
@@ -215,10 +215,15 @@ audio = client.tts.convert(
 
 `latency` trades stability for speed; `chunk_length` controls how much text the engine batches before it starts generating.
 
-- `latency="balanced"` (default) — lower time-to-first-audio (~300ms). Good for interactive use.
+- `latency="balanced"` (Python SDK default) — lower time-to-first-audio (~300ms). Good for interactive use.
 - `latency="normal"` — most stable output, at slightly higher latency.
+- `latency="low"` (raw API only) — lowest latency.
 - `chunk_length` (`100`–`300`, default `200`) — smaller chunks start audio sooner; larger chunks are more efficient for long text.
 
+<Note>
+  The raw HTTP/WebSocket API defaults `latency` to `normal` (quality-tuned), while the Python SDK defaults to `balanced`. For real-time use over the raw API, set `latency` to `balanced` or `low` explicitly — see [Tune latency vs. quality](/features/realtime-streaming#tune-latency-vs-quality).
+</Note>
+
 <CodeGroup>
 ```python Python
 from fishaudio.types import TTSConfig
diff --git a/snippets/emotion-list-tones-s2.mdx b/snippets/emotion-list-tones-s2.mdx
index e4d83d1..b979abe 100644
--- a/snippets/emotion-list-tones-s2.mdx
+++ b/snippets/emotion-list-tones-s2.mdx
@@ -5,3 +5,4 @@
 | Screaming  | `[screaming]`       | Very loud, panicked  | Emergencies, fear          |
 | Whispering | `[whispering]`      | Very soft, secretive | Secrets, quiet scenes      |
 | Soft       | `[soft tone]`       | Gentle, quiet        | Comfort, lullabies         |
+| Emphasis   | `[emphasis]`        | Stress a word/phrase | Highlighting key words     |