fishaudio · leng-yue · Jun 24, 2026 · Jun 24, 2026
diff --git a/developer-guide/models-pricing/choosing-a-model.mdx b/developer-guide/models-pricing/choosing-a-model.mdx
@@ -12,7 +12,8 @@ import { AudioTranscript } from '/snippets/audio-transcript.jsx';
   <AudioTranscript page="models-pricing-choosing-a-model" />
 </Visibility>
 
+We recommend using **Fish Audio S2.1-Pro** for production projects. It improves on S2-Pro quality, latency, and throughput, and is the right choice when you need production TTFA and DPA guarantees.
 
-We recommend using **Fish Audio S2-Pro** for all projects - our flagship model with industry-leading quality and performance.
+Use **`s2.1-pro-free`** for testing, prototyping, development, and smaller businesses. It is the same model as S2.1-Pro at $0, but it does not guarantee TTFA or DPA.
 
-<Support />
+<Support />
diff --git a/developer-guide/models-pricing/models-overview.mdx b/developer-guide/models-pricing/models-overview.mdx
@@ -20,18 +20,40 @@ Fish Audio offers state-of-the-art text-to-speech models optimized for different
 
 ### Recommended Model
 
-<Card title="s2-pro" icon="star">
-  **Fish Audio S2-Pro** - Our next-generation TTS model with best-in-class performance
+<Card title="s2.1-pro" icon="star">
+  **Fish Audio S2.1-Pro** - Our recommended production TTS model and an improved version of S2-Pro
   - Natural language control with `[bracket]` syntax — not limited to a fixed set (e.g., `[whispers sweetly]`, `[laughing nervously]`)
-  - Multi-speaker dialogue support **(S2-Pro exclusive)**
+  - Multi-speaker dialogue support
+  - 83 languages
+  - Improved quality, latency, and throughput over S2-Pro
+  - Production option for workloads that need TTFA and DPA guarantees
+</Card>
+
+### Free Development Model
+
+<Card title="s2.1-pro-free" icon="flask">
+  **Fish Audio S2.1-Pro Free** - The same model as S2.1-Pro, available at $0 for development and testing
+  - Use the `s2.1-pro-free` model string with the same TTS API endpoint
+  - Same model quality and language coverage as `s2.1-pro`
+  - Free to use under fair-use limits
+  - No TTFA or DPA guarantees
+  - Best for testing, prototyping, development, and smaller businesses
+</Card>
+
+### Previous S2 Model
+
+<Card title="s2-pro" icon="microchip">
+  **Fish Audio S2-Pro** - Previous-generation S2 TTS model
+  - Natural language control with `[bracket]` syntax — not limited to a fixed set (e.g., `[whispers sweetly]`, `[laughing nervously]`)
+  - Multi-speaker dialogue support
   - 80+ languages
   - 100ms time-to-first-audio
   - Full SGLang-based serving stack
   - Open-source
 </Card>
 
 <Note>
-We recommend using `s2-pro` for all new projects to access the latest capabilities and performance improvements. S1 remains available for existing integrations.
+We recommend using `s2.1-pro` for production projects. Use `s2.1-pro-free` when you want the same model for evaluation, prototyping, development, and smaller businesses without TTFA or DPA guarantees. S1 remains available for existing integrations.
 </Note>
 
 ### Previous Model
@@ -54,9 +76,9 @@ We recommend using `s2-pro` for all new projects to access the latest capabiliti
 
 ## Supported Languages
 
-### S2-Pro
+### S2.1-Pro and S2-Pro
 
-S2-Pro supports 80+ languages with automatic language detection and inline emotion and paralinguistic cue support.
+S2.1-Pro supports 83 languages, while S2-Pro supports 80+ languages. Both use automatic language detection and support inline emotion and paralinguistic cues.
 
 <Info>
 Language detection is automatic - simply provide text in your target language.
@@ -76,9 +98,9 @@ Russian, Dutch, Italian, Polish, Portuguese
 
 Fish Audio models support emotional expressions and voice styles that can be controlled through text markers in your input.
 
-### S2-Pro Natural Language Control
+### S2.1-Pro and S2-Pro Natural Language Control
 
-S2-Pro treats `[bracket]` tags as standard text rather than dedicated control tokens. Through training on massive datasets, the model learned implicit mappings between natural language descriptions and acoustic variations. This means you are not limited to a predefined set of tags — you can use any descriptive expression and the model will interpret it, such as `[whispers sweetly]` or `[laughing nervously]`.
+S2.1-Pro and S2-Pro treat `[bracket]` tags as standard text rather than dedicated control tokens. Through training on massive datasets, the models learned implicit mappings between natural language descriptions and acoustic variations. This means you are not limited to a predefined set of tags — you can use any descriptive expression and the model will interpret it, such as `[whispers sweetly]` or `[laughing nervously]`.
 
 Common examples include:
 
@@ -88,7 +110,7 @@ Common examples include:
 ```
 
 <Tip>
-S2-Pro cues can be placed anywhere in your text to control emotion at specific positions. For example: `"I can't believe it [gasp] you actually did it [laugh]"`
+S2 cues can be placed anywhere in your text to control emotion at specific positions. For example: `"I can't believe it [gasp] you actually did it [laugh]"`
 </Tip>
 
 ### S1 Voice Styles and Emotions
@@ -127,4 +149,4 @@ S1 supports 64+ emotional expressions using `(parenthesis)` syntax.
 You can also use natural expressions like "Ha,ha,ha" for laughter. Experiment with combinations to achieve the perfect emotional tone for your application.
 </Tip>
 
-<Support />
+<Support />
diff --git a/developer-guide/models-pricing/pricing-and-rate-limits.mdx b/developer-guide/models-pricing/pricing-and-rate-limits.mdx
@@ -21,10 +21,12 @@ The Fish Audio API uses pay-as-you-go pricing based on actual usage. There are n
 
 TTS pricing is based on the size of input text, measured in millions of UTF-8 bytes.
 
-| Model Name   | Price (USD)            |
-|--------------|------------------------|
-| `s2-pro`     | $15.00 / M UTF-8 bytes |
-| `s1`         | $15.00 / M UTF-8 bytes |
+| Model Name        | Price (USD)            |
+|-------------------|------------------------|
+| `s2.1-pro`        | $15.00 / M UTF-8 bytes |
+| `s2.1-pro-free`   | $0.00 / M UTF-8 bytes  |
+| `s2-pro`          | $15.00 / M UTF-8 bytes |
+| `s1`              | $15.00 / M UTF-8 bytes |
 
 <Info>
 1M UTF-8 bytes is approximately 180,000 English words, or about 12 hours of speech

diff --git a/features/text-to-speech.mdx b/features/text-to-speech.mdx
@@ -5,7 +5,7 @@
 icon: "microphone"
 ---
 
-Generate natural speech from text with the `s2-pro` and `s1` models. Pick a voice, choose a format, and go — from the API directly, the Python library, or JavaScript.
+Generate natural speech from text with the `s2.1-pro`, `s2-pro`, and `s1` models. Pick a voice, choose a format, and go — from the API directly, the Python library, or JavaScript.
 
 <CardGroup cols={3}>
   <Card title="Use it in the web app" icon="browser" href="https://fish.audio/app/text-to-speech">
@@ -23,7 +23,7 @@

 <CardGroup cols={2}>
  <Card title="Voiceovers & narration" icon="film">
    Audiobooks, explainers, ads, and video narration.
  </Card>
  <Card title="Conversational AI" icon="comments">
    Speak an assistant's replies — pair with [streaming](/features/realtime-streaming) for low latency.
@@ -42,7 +42,7 @@

 <CodeGroup>
 ```python Python
 from fishaudio import FishAudio
 from fishaudio.utils import save

 client = FishAudio()  # reads FISH_API_KEY
@@ -107,7 +107,9 @@
 
 ### Models
 
-- **`s2-pro`** (default) — highest quality, multi-speaker, natural-language expression control.
+- **`s2.1-pro`** — recommended for production, with improved quality, latency, and throughput over S2-Pro.
+- **`s2.1-pro-free`** — the same model at $0 for testing, prototyping, development, and smaller businesses, without TTFA or DPA guarantees.
+- **`s2-pro`** (default) — previous-generation S2 model with multi-speaker and natural-language expression control.
 - **`s1`** — previous generation, `(parenthesis)` emotion tags.
 
 In the API, select with the `model` request header. In Python, pass `model="s2-pro"`. See [Choosing a Model](/developer-guide/models-pricing/choosing-a-model).
@@ -191,16 +193,16 @@

 To reuse a voice across many requests, [clone it once](/features/voice-cloning) and pass the resulting `reference_id` instead.

 ### Format & bitrate

 Pick a format for your delivery channel, and tune bitrate to trade size against quality:

 | Format | Notes |
 |---|---|
 | `mp3` (default) | good size/quality balance; set `mp3_bitrate` to `64`, `128`, or `192` |
 | `wav` | uncompressed, highest quality; set `sample_rate` (e.g. `44100`) |
 | `pcm` | raw samples, no container — for low-latency playback and telephony pipelines |
 | `opus` | efficient for streaming; bitrate is automatic (`opus_bitrate=-1000`) |

 ```python
 from fishaudio.types import TTSConfig

diff --git a/overview/capabilities.mdx b/overview/capabilities.mdx
@@ -11,7 +11,7 @@
 
 <CardGroup cols={2}>
   <Card title="Text to Speech" icon="microphone" href="/features/text-to-speech">
-    Convert text into lifelike speech with the `s2-pro` and `s1` models.
+    Convert text into lifelike speech with the `s2.1-pro`, `s2-pro`, and `s1` models.
   </Card>
 
   <Card title="Speech to Text" icon="waveform" href="/features/speech-to-text">
@@ -41,7 +41,7 @@
  </Card>

  <Card title="Story Studio" icon="book-open" href="/overview/platform">
    Produce multi-speaker, long-form audio — audiobooks and narration.
  </Card>

  <Card title="Music & Sound Effects" icon="music" href="/overview/platform">
@@ -55,9 +55,11 @@
 
 ## Models
 
-Two text-to-speech models power most capabilities:
+These text-to-speech models power most capabilities:
 
-- **`s2-pro`** — the default, highest-quality model, with multi-speaker and natural-language expression control.
+- **`s2.1-pro`** — the recommended production model, with improved quality, latency, and throughput over S2-Pro.
+- **`s2.1-pro-free`** — the same model at $0 for testing, prototyping, development, and smaller businesses, without TTFA or DPA guarantees.
+- **`s2-pro`** — the previous-generation S2 model, with multi-speaker and natural-language expression control.
 - **`s1`** — the previous generation, with `(parenthesis)` emotion tags.
 
 See [Models Overview](/developer-guide/models-pricing/models-overview) and [Choosing a Model](/developer-guide/models-pricing/choosing-a-model) for the full lineup, languages, and limits.