|
12 | 12 |
|
13 | 13 | <meta name="author" content="Mark Edmondson" /> |
14 | 14 |
|
15 | | -<meta name="date" content="2020-04-16" /> |
| 15 | +<meta name="date" content="2020-04-19" /> |
16 | 16 |
|
17 | | -<title>Google Cloud Speech API</title> |
| 17 | +<title>Google Cloud Speech-to-Text API</title> |
18 | 18 |
|
19 | 19 |
|
20 | 20 |
|
|
299 | 299 |
|
300 | 300 |
|
301 | 301 |
|
302 | | -<h1 class="title toc-ignore">Google Cloud Speech API</h1> |
| 302 | +<h1 class="title toc-ignore">Google Cloud Speech-to-Text API</h1> |
303 | 303 | <h4 class="author">Mark Edmondson</h4> |
304 | | -<h4 class="date">2020-04-16</h4> |
| 304 | +<h4 class="date">2020-04-19</h4> |
305 | 305 |
|
306 | 306 |
|
307 | 307 |
|
308 | | -<p>The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p> |
309 | | -<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech Website</a></p> |
| 308 | +<p>The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p> |
| 309 | +<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech-to-Text Website</a></p> |
310 | 310 | <p>The Cloud Speech API provides audio transcription. Its accessible via the <code>gl_speech</code> function.</p> |
311 | 311 | <p>Arguments include:</p> |
312 | 312 | <ul> |
@@ -337,8 +337,8 @@ <h3>Returned structure</h3> |
337 | 337 | <a class="sourceLine" id="cb1-14" data-line-number="14"><span class="co">#4 0.700s 1.200s to</span></a> |
338 | 338 | <a class="sourceLine" id="cb1-15" data-line-number="15"><span class="co"># etc...</span></a></code></pre></div> |
339 | 339 | </div> |
340 | | -<div id="demo-for-google-cloud-speech-api" class="section level3"> |
341 | | -<h3>Demo for Google Cloud Speech API</h3> |
| 340 | +<div id="demo-for-google-cloud-speech-to-text-api" class="section level3"> |
| 341 | +<h3>Demo for Google Cloud Speech-to-Text API</h3> |
342 | 342 | <p>A test audio file is installed with the package which reads:</p> |
343 | 343 | <blockquote> |
344 | 344 | <p>“To administer medicine to animals is frequently a very difficult matter, and yet sometimes it’s necessary to do so”</p> |
@@ -378,16 +378,30 @@ <h3>Word transcripts</h3> |
378 | 378 | <a class="sourceLine" id="cb3-12" data-line-number="12"><span class="co">#4 0.700s 0.900s A</span></a> |
379 | 379 | <a class="sourceLine" id="cb3-13" data-line-number="13"><span class="co">#5 0.900s 1s Dream</span></a></code></pre></div> |
380 | 380 | </div> |
| 381 | +<div id="custom-configurations" class="section level2"> |
| 382 | +<h2>Custom configurations</h2> |
| 383 | +<p>You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a <a href="https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig"><code>RecognitionConfig</code></a> object. This can be done via R lists which are converted to JSON via <code>library(jsonlite)</code> and an example is shown below:</p> |
| 384 | +<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1"><span class="co">## Use a custom configuration</span></a> |
| 385 | +<a class="sourceLine" id="cb4-2" data-line-number="2">my_config <-<span class="st"> </span><span class="kw">list</span>(<span class="dt">encoding =</span> <span class="st">"LINEAR16"</span>,</a> |
| 386 | +<a class="sourceLine" id="cb4-3" data-line-number="3"> <span class="dt">diarizationConfig =</span> <span class="kw">list</span>(</a> |
| 387 | +<a class="sourceLine" id="cb4-4" data-line-number="4"> <span class="dt">enableSpeakerDiarization =</span> <span class="ot">TRUE</span>,</a> |
| 388 | +<a class="sourceLine" id="cb4-5" data-line-number="5"> <span class="dt">minSpeakerCount =</span> <span class="dv">2</span>,</a> |
| 389 | +<a class="sourceLine" id="cb4-6" data-line-number="6"> <span class="dt">maxSpeakCount =</span> <span class="dv">3</span></a> |
| 390 | +<a class="sourceLine" id="cb4-7" data-line-number="7"> ))</a> |
| 391 | +<a class="sourceLine" id="cb4-8" data-line-number="8"></a> |
| 392 | +<a class="sourceLine" id="cb4-9" data-line-number="9"><span class="co"># languageCode is required, so will be added if not in your custom config</span></a> |
| 393 | +<a class="sourceLine" id="cb4-10" data-line-number="10"><span class="kw">gl_speech</span>(my_audio, <span class="dt">languageCode =</span> <span class="st">"en-US"</span>, <span class="dt">customConfig =</span> my_config)</a></code></pre></div> |
| 394 | +</div> |
381 | 395 | <div id="asynchronous-calls" class="section level2"> |
382 | 396 | <h2>Asynchronous calls</h2> |
383 | 397 | <p>For speech files greater than 60 seconds of if you don’t want your results straight away, set <code>asynch = TRUE</code> in the call to the API.</p> |
384 | 398 | <p>This will return an object of class <code>"gl_speech_op"</code> which should be used within the <code>gl_speech_op()</code> function to check the status of the task. If the task is finished, then it will return an object the same form as the non-asynchronous case.</p> |
385 | | -<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1">async <-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a> |
386 | | -<a class="sourceLine" id="cb4-2" data-line-number="2">async</a> |
387 | | -<a class="sourceLine" id="cb4-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a> |
388 | | -<a class="sourceLine" id="cb4-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a> |
389 | | -<a class="sourceLine" id="cb4-5" data-line-number="5"></a> |
390 | | -<a class="sourceLine" id="cb4-6" data-line-number="6">result <-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div> |
| 399 | +<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1">async <-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a> |
| 400 | +<a class="sourceLine" id="cb5-2" data-line-number="2">async</a> |
| 401 | +<a class="sourceLine" id="cb5-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a> |
| 402 | +<a class="sourceLine" id="cb5-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a> |
| 403 | +<a class="sourceLine" id="cb5-5" data-line-number="5"></a> |
| 404 | +<a class="sourceLine" id="cb5-6" data-line-number="6">result <-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div> |
391 | 405 | </div> |
392 | 406 |
|
393 | 407 |
|
|
0 commit comments