You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: api-reference/voice.mdx
+93-75Lines changed: 93 additions & 75 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: "API reference for real-time voice transcription and translation wi
5
5
public: true
6
6
---
7
7
8
-
The Voice API provides real-time voice transcription and translation services. It consists of POST endpoint `voice/realtime` to initialize a session and a WebSocket endpoint `voice/realtime/connect` to stream audio data.
8
+
The Voice API provides real-time voice transcription and translation services. It consists of a POST endpoint `voice/realtime` to initialize a session and a WebSocket endpoint `voice/realtime/connect` to stream audio data.
9
9
10
10
<Info>
11
11
The Voice API is currently available to select DeepL API Pro customers only. Contact your DeepL representative for access.
@@ -20,70 +20,75 @@ The Voice API provides a way to open WebSocket streaming connections to transcri
20
20
* Receive translations in multiple target languages
21
21
22
22
The API uses a two-step flow:
23
-
1.**Request a streaming URL** via POST request
24
-
2.**Stream audio** via WebSocket
23
+
1.[**Request a streaming URL**](/api-reference/voice/get-streaming-url) via POST request
24
+
2.[**Stream audio**](/api-reference/voice/websocket-streaming) via WebSocket
25
25
26
26
## Supported Languages
27
27
28
-
The following source languages are supported for voice input:
All source languages can be translated into any target language.
29
+
30
+
<Accordiontitle="Show supported languages">
31
+
<Columnscols={2}>
32
+
<div>
33
+
<b>Source languages</b>
34
+
<div>Chinese</div>
35
+
<div>Czech</div>
36
+
<div>Dutch</div>
37
+
<div>English</div>
38
+
<div>French</div>
39
+
<div>German</div>
40
+
<div>Indonesian</div>
41
+
<div>Italian</div>
42
+
<div>Japanese</div>
43
+
<div>Korean</div>
44
+
<div>Polish</div>
45
+
<div>Portuguese</div>
46
+
<div>Romanian</div>
47
+
<div>Russian</div>
48
+
<div>Spanish</div>
49
+
<div>Swedish</div>
50
+
<div>Turkish</div>
51
+
<div>Ukrainian</div>
52
+
</div>
53
+
<div>
54
+
<b>Target languages</b>
55
+
<div>Arabic</div>
56
+
<div>Bulgarian</div>
57
+
<div>Chinese (Simplified)</div>
58
+
<div>Chinese (Traditional)</div>
59
+
<div>Czech</div>
60
+
<div>Danish</div>
61
+
<div>Dutch</div>
62
+
<div>English (American)</div>
63
+
<div>English (British)</div>
64
+
<div>Estonian</div>
65
+
<div>Finnish</div>
66
+
<div>French</div>
67
+
<div>German</div>
68
+
<div>Greek</div>
69
+
<div>Hebrew</div>
70
+
<div>Hungarian</div>
71
+
<div>Indonesian</div>
72
+
<div>Italian</div>
73
+
<div>Japanese</div>
74
+
<div>Korean</div>
75
+
<div>Latvian</div>
76
+
<div>Lithuanian</div>
77
+
<div>Norwegian Bokmål</div>
78
+
<div>Polish</div>
79
+
<div>Portuguese (Brazil)</div>
80
+
<div>Portuguese (Portugal)</div>
81
+
<div>Romanian</div>
82
+
<div>Russian</div>
83
+
<div>Slovak</div>
84
+
<div>Slovenian</div>
85
+
<div>Spanish</div>
86
+
<div>Swedish</div>
87
+
<div>Turkish</div>
88
+
<div>Ukrainian</div>
89
+
<div>Vietnamese</div>
90
+
</div>
91
+
</Columns>
87
92
</Accordion>
88
93
89
94
## Two-Step API Flow
@@ -109,25 +114,33 @@ sequenceDiagram
109
114
110
115
par
111
116
loop Send audio data
112
-
Client->>Voice API: SourceMediaChunk
117
+
Client->>Voice API: source_media_chunk
113
118
end
114
119
and
115
120
loop Receive updates
116
-
Voice API-->>Client: SourceTranscriptUpdate
117
-
Voice API-->>Client: TargetTranscriptUpdate
121
+
Voice API-->>Client: source_transcript_update
122
+
end
123
+
and Per target language
124
+
loop Receive updates
125
+
Voice API-->>Client: target_transcript_update
118
126
end
119
127
end
120
128
121
-
Client->>Voice API: EndOfSourceAudio
129
+
Client->>Voice API: end_of_source_audio
122
130
123
-
loop Final updates
124
-
Voice API-->>Client: SourceTranscriptUpdate
125
-
Voice API-->>Client: TargetTranscriptUpdate
131
+
par
132
+
loop Final updates
133
+
Voice API-->>Client: source_transcript_update
134
+
end
135
+
and Per target language
136
+
loop Final updates
137
+
Voice API-->>Client: target_transcript_update
138
+
end
126
139
end
127
140
128
-
Voice API-->>Client: EndOfSourceTranscript
141
+
Voice API-->>Client: end_of_source_transcript
129
142
130
-
Voice API-->>Client: EndOfTargetTranscript<br>(once per target language)
143
+
Voice API-->>Client: end_of_target_transcript<br>(once per target language)
131
144
132
145
Note over Client,Voice API: Connection Closed
133
146
```
@@ -145,13 +158,21 @@ sequenceDiagram
145
158
* Authentication and authorization
146
159
* Main configuration options (audio format, languages, glossaries, etc.)
147
160
161
+
<Note>
162
+
URL and token are valid for one-time use only.
163
+
</Note>
164
+
148
165
See the [Get Streaming URL](/api-reference/voice/get-streaming-url) documentation for details.
149
166
</Step>
150
167
<Steptitle="Streaming Audio and Text (WebSocket)">
151
168
Use the received URL to establish a WebSocket connection for:
152
169
* Sending audio data
153
170
* Receiving transcriptions and translations in real-time
154
171
172
+
<Note>
173
+
Once a WebSocket connection is established, you must send audio data to prevent connection closure.
174
+
</Note>
175
+
155
176
See the [WebSocket Streaming](/api-reference/voice/websocket-streaming) documentation for details.
156
177
</Step>
157
178
</Steps>
@@ -163,6 +184,7 @@ sequenceDiagram
163
184
* Audio chunk size: should not exceed 100 kilobyte or 1 second duration
164
185
* Recommended chunk duration: 50-250 milliseconds for low latency
165
186
* Audio stream speed: maximum 2x real-time
187
+
* Timeout: If no data is received for 30 seconds, the session will be terminated
166
188
167
189
## Getting Started
168
190
@@ -173,7 +195,3 @@ To start using the Voice API:
173
195
3. Review the [WebSocket Streaming](/api-reference/voice/websocket-streaming) documentation
174
196
4. Choose your audio format and configuration
175
197
5. Implement the two-step flow in your application
176
-
177
-
<Info>
178
-
For privacy and security, streaming URLs are ephemeral and valid for one-time use only. Once a WebSocket connection is established, you must send audio data to prevent connection closure.
0 commit comments