fishaudio · cshape · Jun 19, 2026
diff --git a/api-reference/openapi.json b/api-reference/openapi.json
@@ -1345,11 +1345,7 @@
                       "type": "string"
                     },
                     "train_mode": {
-                      "default": "full",
-                      "enum": [
-                        "fast",
-                        "full"
-                      ],
+                      "const": "fast",
                       "title": "Train Mode",
                       "type": "string"
                     },
@@ -1647,7 +1643,7 @@
                       "type": "string"
                     },
                     "train_mode": {
-                      "default": "full",
+                      "default": "fast",
                       "enum": [
                         "fast",
                         "full"
@@ -4052,7 +4048,7 @@
             "type": "string"
           },
           "train_mode": {
-            "default": "full",
+            "default": "fast",
             "enum": [
               "fast",
               "full"

diff --git a/developer-guide/core-features/emotions.mdx b/developer-guide/core-features/emotions.mdx
@@ -60,9 +60,17 @@
 
 <AdvancedEmotions />
 
-### Tone Markers (5 expressions)
+## Sound & Delivery Markers
 
-Control volume and intensity:
+These markers aren't emotions — they shape *how* a line is delivered, add natural human sounds, or layer in ambient effects. Combine them with the emotion cues above.
+
+### Tone Markers (6 expressions)
+
+Control volume, intensity, and emphasis. Place `[emphasis]` right before the word or phrase you want to stress:
+
+```text
+This is [emphasis] really important.
+```
 
 <ToneMarkers />
 
@@ -159,7 +167,7 @@
 - Use natural expressions when possible
 - Space out emotional changes for realism

 ### Don'ts

 - Don't overuse emotion tags in short text
 - Don't mix conflicting emotions
@@ -246,7 +254,7 @@
 | Whispered Secret | `[mysterious][whispering]` | "I have something to tell you..."     |
 | Angry Shout      | `[angry][shouting]`        | "Stop right there!"                   |
 | Sad Sigh         | `[sad][sighing]`           | "I wish things were different. Sigh." |
 | Excited Laugh    | `[excited][laughing]`      | "We did it! Ha ha!"                   |
 | Nervous Question | `[nervous][uncertain]`     | "Are you sure about this?"            |

 ## S1 (legacy) syntax

diff --git a/features/realtime-streaming.mdx b/features/realtime-streaming.mdx
@@ -1,5 +1,5 @@
 ---
 title: "Realtime Streaming"
 description: "Stream audio as it generates for the lowest latency"
 icon: "bolt"
 ---
@@ -41,11 +41,11 @@

 <CodeGroup>
 ```python Python
 from fishaudio import FishAudio

 client = FishAudio()  # reads FISH_API_KEY

 with open("out.mp3", "wb") as f:
    for chunk in client.tts.stream(text="Streaming keeps latency low."):
        f.write(chunk)  # or send to a speaker / socket as it arrives

@@ -92,7 +92,7 @@

 <CodeGroup>
 ```python Python
 from fishaudio import FishAudio
 from fishaudio.utils import play

 client = FishAudio()
@@ -151,14 +151,26 @@
 
 Both streaming paths take a `latency` mode:
 
-- `latency="balanced"` (default) — lowest time-to-first-audio. Use it for voice agents and live LLM output.
+- `latency="balanced"` (Python SDK default) — lowest time-to-first-audio. Use it for voice agents and live LLM output.
 - `latency="normal"` — slightly higher latency, best audio quality. Use it for narration where you can afford a beat.
 
 ```python
 for chunk in client.tts.stream_websocket(llm_tokens(), latency="balanced"):
     ...
 ```
 
+<Warning>
+  **Set `latency` explicitly for real-time use.** The Python SDK defaults to `balanced`, but the raw HTTP/WebSocket API defaults to `normal`, which is tuned for quality and noticeably increases time-to-first-audio — you may wait several seconds for the first chunk. If you call the API directly, or through a third-party integration such as the LiveKit plugin, pass `balanced` (or `low`) for interactive latency.
+</Warning>
+
+The available modes differ slightly between the raw API and the SDK:
+
+| Mode       | Raw HTTP/WebSocket API | Python SDK    | Behavior                                       |
+| ---------- | ---------------------- | ------------- | ---------------------------------------------- |
+| `low`      | Supported              | Not available | Lowest latency                                 |
+| `balanced` | Supported              | Default       | Reduced latency — recommended for real-time    |
+| `normal`   | Default                | Supported     | Best quality, highest time-to-first-audio      |
+
 For finer control, pass a `TTSConfig` with chunk tuning. Smaller chunks emit audio sooner (lower latency); larger chunks give the model more context (smoother prosody):
 
 ```python
@@ -176,7 +188,7 @@

 ## Stream asynchronously

 For asyncio apps, `AsyncFishAudio` exposes the same streaming methods. `stream_websocket` accepts an async generator, so you can pipe an async LLM client straight into speech.

 ```python
 import asyncio

diff --git a/features/text-to-speech.mdx b/features/text-to-speech.mdx
@@ -23,7 +23,7 @@

 <CardGroup cols={2}>
  <Card title="Voiceovers & narration" icon="film">
    Audiobooks, explainers, ads, and video narration.
  </Card>
  <Card title="Conversational AI" icon="comments">
    Speak an assistant's replies — pair with [streaming](/features/realtime-streaming) for low latency.
@@ -42,7 +42,7 @@

 <CodeGroup>
 ```python Python
 from fishaudio import FishAudio
 from fishaudio.utils import save

 client = FishAudio()  # reads FISH_API_KEY
@@ -191,16 +191,16 @@

 To reuse a voice across many requests, [clone it once](/features/voice-cloning) and pass the resulting `reference_id` instead.

 ### Format & bitrate

 Pick a format for your delivery channel, and tune bitrate to trade size against quality:

 | Format | Notes |
 |---|---|
 | `mp3` (default) | good size/quality balance; set `mp3_bitrate` to `64`, `128`, or `192` |
 | `wav` | uncompressed, highest quality; set `sample_rate` (e.g. `44100`) |
 | `pcm` | raw samples, no container — for low-latency playback and telephony pipelines |
 | `opus` | efficient for streaming; bitrate is automatic (`opus_bitrate=-1000`) |

 ```python
 from fishaudio.types import TTSConfig
@@ -215,10 +215,15 @@
 
 `latency` trades stability for speed; `chunk_length` controls how much text the engine batches before it starts generating.
 
-- `latency="balanced"` (default) — lower time-to-first-audio (~300ms). Good for interactive use.
+- `latency="balanced"` (Python SDK default) — lower time-to-first-audio (~300ms). Good for interactive use.
 - `latency="normal"` — most stable output, at slightly higher latency.
+- `latency="low"` (raw API only) — lowest latency.
 - `chunk_length` (`100`–`300`, default `200`) — smaller chunks start audio sooner; larger chunks are more efficient for long text.
 
+<Note>
+  The raw HTTP/WebSocket API defaults `latency` to `normal` (quality-tuned), while the Python SDK defaults to `balanced`. For real-time use over the raw API, set `latency` to `balanced` or `low` explicitly — see [Tune latency vs. quality](/features/realtime-streaming#tune-latency-vs-quality).
+</Note>
+
 <CodeGroup>
 ```python Python
 from fishaudio.types import TTSConfig

diff --git a/snippets/emotion-list-tones-s2.mdx b/snippets/emotion-list-tones-s2.mdx
@@ -5,3 +5,4 @@
 | Screaming  | `[screaming]`       | Very loud, panicked  | Emergencies, fear          |
 | Whispering | `[whispering]`      | Very soft, secretive | Secrets, quiet scenes      |
 | Soft       | `[soft tone]`       | Gentle, quiet        | Comfort, lullabies         |
+| Emphasis   | `[emphasis]`        | Stress a word/phrase | Highlighting key words     |