Skip to content

Commit c51137d

Browse files
authored
Merge pull request #149 from DeepLcom/acl/acl-2127-rework-voice-docs
[ACL-2127] Various improvements to Voice API docs
2 parents ce55f34 + e3f7fa5 commit c51137d

2 files changed

Lines changed: 94 additions & 75 deletions

File tree

api-reference/openapi.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4214,6 +4214,7 @@ components:
42144214
supported Voice API source languages and comply with IETF BCP 47 language tags.
42154215
enum:
42164216
- de
4217+
- cs
42174218
- en
42184219
- es
42194220
- fr

api-reference/voice.mdx

Lines changed: 93 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: "API reference for real-time voice transcription and translation wi
55
public: true
66
---
77

8-
The Voice API provides real-time voice transcription and translation services. It consists of POST endpoint `voice/realtime` to initialize a session and a WebSocket endpoint `voice/realtime/connect` to stream audio data.
8+
The Voice API provides real-time voice transcription and translation services. It consists of a POST endpoint `voice/realtime` to initialize a session and a WebSocket endpoint `voice/realtime/connect` to stream audio data.
99

1010
<Info>
1111
The Voice API is currently available to select DeepL API Pro customers only. Contact your DeepL representative for access.
@@ -20,70 +20,75 @@ The Voice API provides a way to open WebSocket streaming connections to transcri
2020
* Receive translations in multiple target languages
2121

2222
The API uses a two-step flow:
23-
1. **Request a streaming URL** via POST request
24-
2. **Stream audio** via WebSocket
23+
1. [**Request a streaming URL**](/api-reference/voice/get-streaming-url) via POST request
24+
2. [**Stream audio**](/api-reference/voice/websocket-streaming) via WebSocket
2525

2626
## Supported Languages
2727

28-
The following source languages are supported for voice input:
29-
<Accordion title="Show supported source languages">
30-
* Chinese (Mandarin)
31-
* Dutch
32-
* English
33-
* French
34-
* German
35-
* Indonesian
36-
* Italian
37-
* Japanese
38-
* Korean
39-
* Polish
40-
* Portuguese
41-
* Romanian
42-
* Russian
43-
* Spanish
44-
* Swedish
45-
* Turkish
46-
* Ukrainian
47-
</Accordion>
48-
49-
All source languages can be translated into the following target languages:
50-
51-
<Accordion title="Show supported target languages">
52-
* Arabic
53-
* Bulgarian
54-
* Chinese (Simplified)
55-
* Chinese (Traditional)
56-
* Czech
57-
* Danish
58-
* Dutch
59-
* English (American)
60-
* English (British)
61-
* Estonian
62-
* Finnish
63-
* French
64-
* German
65-
* Greek
66-
* Hebrew
67-
* Hungarian
68-
* Indonesian
69-
* Italian
70-
* Japanese
71-
* Korean
72-
* Latvian
73-
* Lithuanian
74-
* Norwegian Bokmål
75-
* Polish
76-
* Portuguese (Brazil)
77-
* Portuguese (Portugal)
78-
* Romanian
79-
* Russian
80-
* Slovak
81-
* Slovenian
82-
* Spanish
83-
* Swedish
84-
* Turkish
85-
* Ukrainian
86-
* Vietnamese
28+
All source languages can be translated into any target language.
29+
30+
<Accordion title="Show supported languages">
31+
<Columns cols={2}>
32+
<div>
33+
<b>Source languages</b>
34+
<div>Chinese</div>
35+
<div>Czech</div>
36+
<div>Dutch</div>
37+
<div>English</div>
38+
<div>French</div>
39+
<div>German</div>
40+
<div>Indonesian</div>
41+
<div>Italian</div>
42+
<div>Japanese</div>
43+
<div>Korean</div>
44+
<div>Polish</div>
45+
<div>Portuguese</div>
46+
<div>Romanian</div>
47+
<div>Russian</div>
48+
<div>Spanish</div>
49+
<div>Swedish</div>
50+
<div>Turkish</div>
51+
<div>Ukrainian</div>
52+
</div>
53+
<div>
54+
<b>Target languages</b>
55+
<div>Arabic</div>
56+
<div>Bulgarian</div>
57+
<div>Chinese (Simplified)</div>
58+
<div>Chinese (Traditional)</div>
59+
<div>Czech</div>
60+
<div>Danish</div>
61+
<div>Dutch</div>
62+
<div>English (American)</div>
63+
<div>English (British)</div>
64+
<div>Estonian</div>
65+
<div>Finnish</div>
66+
<div>French</div>
67+
<div>German</div>
68+
<div>Greek</div>
69+
<div>Hebrew</div>
70+
<div>Hungarian</div>
71+
<div>Indonesian</div>
72+
<div>Italian</div>
73+
<div>Japanese</div>
74+
<div>Korean</div>
75+
<div>Latvian</div>
76+
<div>Lithuanian</div>
77+
<div>Norwegian Bokmål</div>
78+
<div>Polish</div>
79+
<div>Portuguese (Brazil)</div>
80+
<div>Portuguese (Portugal)</div>
81+
<div>Romanian</div>
82+
<div>Russian</div>
83+
<div>Slovak</div>
84+
<div>Slovenian</div>
85+
<div>Spanish</div>
86+
<div>Swedish</div>
87+
<div>Turkish</div>
88+
<div>Ukrainian</div>
89+
<div>Vietnamese</div>
90+
</div>
91+
</Columns>
8792
</Accordion>
8893

8994
## Two-Step API Flow
@@ -109,25 +114,33 @@ sequenceDiagram
109114
110115
par
111116
loop Send audio data
112-
Client->>Voice API: SourceMediaChunk
117+
Client->>Voice API: source_media_chunk
113118
end
114119
and
115120
loop Receive updates
116-
Voice API-->>Client: SourceTranscriptUpdate
117-
Voice API-->>Client: TargetTranscriptUpdate
121+
Voice API-->>Client: source_transcript_update
122+
end
123+
and Per target language
124+
loop Receive updates
125+
Voice API-->>Client: target_transcript_update
118126
end
119127
end
120128
121-
Client->>Voice API: EndOfSourceAudio
129+
Client->>Voice API: end_of_source_audio
122130
123-
loop Final updates
124-
Voice API-->>Client: SourceTranscriptUpdate
125-
Voice API-->>Client: TargetTranscriptUpdate
131+
par
132+
loop Final updates
133+
Voice API-->>Client: source_transcript_update
134+
end
135+
and Per target language
136+
loop Final updates
137+
Voice API-->>Client: target_transcript_update
138+
end
126139
end
127140
128-
Voice API-->>Client: EndOfSourceTranscript
141+
Voice API-->>Client: end_of_source_transcript
129142
130-
Voice API-->>Client: EndOfTargetTranscript<br>(once per target language)
143+
Voice API-->>Client: end_of_target_transcript<br>(once per target language)
131144
132145
Note over Client,Voice API: Connection Closed
133146
```
@@ -145,13 +158,21 @@ sequenceDiagram
145158
* Authentication and authorization
146159
* Main configuration options (audio format, languages, glossaries, etc.)
147160

161+
<Note>
162+
URL and token are valid for one-time use only.
163+
</Note>
164+
148165
See the [Get Streaming URL](/api-reference/voice/get-streaming-url) documentation for details.
149166
</Step>
150167
<Step title="Streaming Audio and Text (WebSocket)">
151168
Use the received URL to establish a WebSocket connection for:
152169
* Sending audio data
153170
* Receiving transcriptions and translations in real-time
154171

172+
<Note>
173+
Once a WebSocket connection is established, you must send audio data to prevent connection closure.
174+
</Note>
175+
155176
See the [WebSocket Streaming](/api-reference/voice/websocket-streaming) documentation for details.
156177
</Step>
157178
</Steps>
@@ -163,6 +184,7 @@ sequenceDiagram
163184
* Audio chunk size: should not exceed 100 kilobyte or 1 second duration
164185
* Recommended chunk duration: 50-250 milliseconds for low latency
165186
* Audio stream speed: maximum 2x real-time
187+
* Timeout: If no data is received for 30 seconds, the session will be terminated
166188

167189
## Getting Started
168190

@@ -173,7 +195,3 @@ To start using the Voice API:
173195
3. Review the [WebSocket Streaming](/api-reference/voice/websocket-streaming) documentation
174196
4. Choose your audio format and configuration
175197
5. Implement the two-step flow in your application
176-
177-
<Info>
178-
For privacy and security, streaming URLs are ephemeral and valid for one-time use only. Once a WebSocket connection is established, you must send audio data to prevent connection closure.
179-
</Info>

0 commit comments

Comments
 (0)