Merge pull request #149 from DeepLcom/acl/acl-2127-rework-voice-docs

sj-dl · web-flow · commit c51137d112a2 · 2025-11-14T11:52:05.000+01:00
[ACL-2127] Various improvements to Voice API docs
diff --git a/api-reference/openapi.yaml b/api-reference/openapi.yaml
@@ -4214,6 +4214,7 @@ components:
         supported Voice API source languages and comply with IETF BCP 47 language tags.
       enum:
         - de
+        - cs
         - en
         - es
         - fr
diff --git a/api-reference/voice.mdx b/api-reference/voice.mdx
@@ -5,7 +5,7 @@ description: "API reference for real-time voice transcription and translation wi
 public: true
 ---
 
-The Voice API provides real-time voice transcription and translation services. It consists of POST endpoint `voice/realtime` to initialize a session and a WebSocket endpoint `voice/realtime/connect` to stream audio data.
+The Voice API provides real-time voice transcription and translation services. It consists of a POST endpoint `voice/realtime` to initialize a session and a WebSocket endpoint `voice/realtime/connect` to stream audio data.
 
 <Info>
   The Voice API is currently available to select DeepL API Pro customers only. Contact your DeepL representative for access.
@@ -20,70 +20,75 @@ The Voice API provides a way to open WebSocket streaming connections to transcri
 * Receive translations in multiple target languages
 
 The API uses a two-step flow:
-1. **Request a streaming URL** via POST request
-2. **Stream audio** via WebSocket
+1. [**Request a streaming URL**](/api-reference/voice/get-streaming-url) via POST request
+2. [**Stream audio**](/api-reference/voice/websocket-streaming) via WebSocket
 
 ## Supported Languages
 
-The following source languages are supported for voice input:
-<Accordion title="Show supported source languages">
-* Chinese (Mandarin)
-* Dutch
-* English
-* French
-* German
-* Indonesian
-* Italian
-* Japanese
-* Korean
-* Polish
-* Portuguese
-* Romanian
-* Russian
-* Spanish
-* Swedish
-* Turkish
-* Ukrainian
-</Accordion>
-
-All source languages can be translated into the following target languages:
-
-<Accordion title="Show supported target languages">
-* Arabic
-* Bulgarian
-* Chinese (Simplified)
-* Chinese (Traditional)
-* Czech
-* Danish
-* Dutch
-* English (American)
-* English (British)
-* Estonian
-* Finnish
-* French
-* German
-* Greek
-* Hebrew
-* Hungarian
-* Indonesian
-* Italian
-* Japanese
-* Korean
-* Latvian
-* Lithuanian
-* Norwegian Bokmål
-* Polish
-* Portuguese (Brazil)
-* Portuguese (Portugal)
-* Romanian
-* Russian
-* Slovak
-* Slovenian
-* Spanish
-* Swedish
-* Turkish
-* Ukrainian
-* Vietnamese
+All source languages can be translated into any target language.
+
+<Accordion title="Show supported languages">
+  <Columns cols={2}>
+  <div>
+    <b>Source languages</b>
+    <div>Chinese</div>
+    <div>Czech</div>
+    <div>Dutch</div>
+    <div>English</div>
+    <div>French</div>
+    <div>German</div>
+    <div>Indonesian</div>
+    <div>Italian</div>
+    <div>Japanese</div>
+    <div>Korean</div>
+    <div>Polish</div>
+    <div>Portuguese</div>
+    <div>Romanian</div>
+    <div>Russian</div>
+    <div>Spanish</div>
+    <div>Swedish</div>
+    <div>Turkish</div>
+    <div>Ukrainian</div>
+  </div>
+  <div>
+    <b>Target languages</b>
+    <div>Arabic</div>
+    <div>Bulgarian</div>
+    <div>Chinese (Simplified)</div>
+    <div>Chinese (Traditional)</div>
+    <div>Czech</div>
+    <div>Danish</div>
+    <div>Dutch</div>
+    <div>English (American)</div>
+    <div>English (British)</div>
+    <div>Estonian</div>
+    <div>Finnish</div>
+    <div>French</div>
+    <div>German</div>
+    <div>Greek</div>
+    <div>Hebrew</div>
+    <div>Hungarian</div>
+    <div>Indonesian</div>
+    <div>Italian</div>
+    <div>Japanese</div>
+    <div>Korean</div>
+    <div>Latvian</div>
+    <div>Lithuanian</div>
+    <div>Norwegian Bokmål</div>
+    <div>Polish</div>
+    <div>Portuguese (Brazil)</div>
+    <div>Portuguese (Portugal)</div>
+    <div>Romanian</div>
+    <div>Russian</div>
+    <div>Slovak</div>
+    <div>Slovenian</div>
+    <div>Spanish</div>
+    <div>Swedish</div>
+    <div>Turkish</div>
+    <div>Ukrainian</div>
+    <div>Vietnamese</div>
+    </div>
+  </Columns>
 </Accordion>
 
 ## Two-Step API Flow
@@ -109,25 +114,33 @@ sequenceDiagram
 
     par 
       loop Send audio data
-        Client->>Voice API: SourceMediaChunk
+        Client->>Voice API: source_media_chunk
       end
     and 
       loop Receive updates
-        Voice API-->>Client: SourceTranscriptUpdate
-        Voice API-->>Client: TargetTranscriptUpdate
+        Voice API-->>Client: source_transcript_update
+      end
+    and Per target language
+      loop Receive updates
+        Voice API-->>Client: target_transcript_update
       end
     end
 
-    Client->>Voice API: EndOfSourceAudio
+    Client->>Voice API: end_of_source_audio
 
-    loop Final updates
-        Voice API-->>Client: SourceTranscriptUpdate
-        Voice API-->>Client: TargetTranscriptUpdate
+    par
+      loop Final updates
+        Voice API-->>Client: source_transcript_update
+      end
+    and Per target language
+      loop Final updates
+        Voice API-->>Client: target_transcript_update
+      end
     end
 
-    Voice API-->>Client: EndOfSourceTranscript
+    Voice API-->>Client: end_of_source_transcript
 
-    Voice API-->>Client: EndOfTargetTranscript<br>(once per target language)
+    Voice API-->>Client: end_of_target_transcript<br>(once per target language)
 
     Note over Client,Voice API: Connection Closed
 ```
@@ -145,13 +158,21 @@ sequenceDiagram
     * Authentication and authorization
     * Main configuration options (audio format, languages, glossaries, etc.)
 
+    <Note>
+      URL and token are valid for one-time use only.
+    </Note>
+
     See the [Get Streaming URL](/api-reference/voice/get-streaming-url) documentation for details.
   </Step>
   <Step title="Streaming Audio and Text (WebSocket)">
     Use the received URL to establish a WebSocket connection for:
     * Sending audio data
     * Receiving transcriptions and translations in real-time
 
+    <Note>
+      Once a WebSocket connection is established, you must send audio data to prevent connection closure.
+    </Note>
+
     See the [WebSocket Streaming](/api-reference/voice/websocket-streaming) documentation for details.
   </Step>
 </Steps>
@@ -163,6 +184,7 @@ sequenceDiagram
 * Audio chunk size: should not exceed 100 kilobyte or 1 second duration
 * Recommended chunk duration: 50-250 milliseconds for low latency
 * Audio stream speed: maximum 2x real-time
+* Timeout: If no data is received for 30 seconds, the session will be terminated
 
 ## Getting Started
 
@@ -173,7 +195,3 @@ To start using the Voice API:
 3. Review the [WebSocket Streaming](/api-reference/voice/websocket-streaming) documentation
 4. Choose your audio format and configuration
 5. Implement the two-step flow in your application
-
-<Info>
-  For privacy and security, streaming URLs are ephemeral and valid for one-time use only. Once a WebSocket connection is established, you must send audio data to prevent connection closure.
-</Info>

-Original file line number
+Diff line change
         supported Voice API source languages and comply with IETF BCP 47 language tags.
       enum:
         - de
 +        - cs
         - en
         - es
         - fr