Skip to content

Commit 916510e

Browse files
rename build folder #70 and add some docs
1 parent a318edc commit 916510e

7 files changed

Lines changed: 419 additions & 107 deletions

File tree

.Rbuildignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,5 @@
99
^\.httr-oauth$
1010
^cran-comments\.md$
1111
^\.Renviron$
12-
^build$
12+
^cloud_build$
1313
^CRAN-RELEASE$
File renamed without changes.

vignettes/speech.Rmd

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
2-
title: "Google Cloud Speech API"
2+
title: "Google Cloud Speech-to-Text API"
33
author: "Mark Edmondson"
44
date: "`r Sys.Date()`"
55
output: rmarkdown::html_vignette
66
vignette: >
7-
%\VignetteIndexEntry{Google Cloud Speech API}
7+
%\VignetteIndexEntry{Google Cloud Speech-to-Text API}
88
%\VignetteEngine{knitr::rmarkdown}
99
%\VignetteEncoding{UTF-8}
1010
---
1111

12-
The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.
12+
The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.
1313

14-
Read more [on the Google Cloud Speech Website](https://cloud.google.com/speech/)
14+
Read more [on the Google Cloud Speech-to-Text Website](https://cloud.google.com/speech/)
1515

1616
The Cloud Speech API provides audio transcription. Its accessible via the `gl_speech` function.
1717

@@ -47,7 +47,7 @@ return$timings
4747
# etc...
4848
```
4949

50-
### Demo for Google Cloud Speech API
50+
### Demo for Google Cloud Speech-to-Text API
5151

5252

5353
A test audio file is installed with the package which reads:
@@ -96,6 +96,23 @@ result$timings
9696
#5 0.900s 1s Dream
9797
```
9898

99+
## Custom configurations
100+
101+
You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a [`RecognitionConfig`](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig) object. This can be done via R lists which are converted to JSON via `library(jsonlite)` and an example is shown below:
102+
103+
```r
104+
## Use a custom configuration
105+
my_config <- list(encoding = "LINEAR16",
106+
diarizationConfig = list(
107+
enableSpeakerDiarization = TRUE,
108+
minSpeakerCount = 2,
109+
maxSpeakCount = 3
110+
))
111+
112+
# languageCode is required, so will be added if not in your custom config
113+
gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)
114+
```
115+
99116
## Asynchronous calls
100117

101118
For speech files greater than 60 seconds of if you don't want your results straight away, set `asynch = TRUE` in the call to the API.

vignettes/speech.html

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212

1313
<meta name="author" content="Mark Edmondson" />
1414

15-
<meta name="date" content="2020-04-16" />
15+
<meta name="date" content="2020-04-19" />
1616

17-
<title>Google Cloud Speech API</title>
17+
<title>Google Cloud Speech-to-Text API</title>
1818

1919

2020

@@ -299,14 +299,14 @@
299299

300300

301301

302-
<h1 class="title toc-ignore">Google Cloud Speech API</h1>
302+
<h1 class="title toc-ignore">Google Cloud Speech-to-Text API</h1>
303303
<h4 class="author">Mark Edmondson</h4>
304-
<h4 class="date">2020-04-16</h4>
304+
<h4 class="date">2020-04-19</h4>
305305

306306

307307

308-
<p>The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p>
309-
<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech Website</a></p>
308+
<p>The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p>
309+
<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech-to-Text Website</a></p>
310310
<p>The Cloud Speech API provides audio transcription. Its accessible via the <code>gl_speech</code> function.</p>
311311
<p>Arguments include:</p>
312312
<ul>
@@ -337,8 +337,8 @@ <h3>Returned structure</h3>
337337
<a class="sourceLine" id="cb1-14" data-line-number="14"><span class="co">#4 0.700s 1.200s to</span></a>
338338
<a class="sourceLine" id="cb1-15" data-line-number="15"><span class="co"># etc...</span></a></code></pre></div>
339339
</div>
340-
<div id="demo-for-google-cloud-speech-api" class="section level3">
341-
<h3>Demo for Google Cloud Speech API</h3>
340+
<div id="demo-for-google-cloud-speech-to-text-api" class="section level3">
341+
<h3>Demo for Google Cloud Speech-to-Text API</h3>
342342
<p>A test audio file is installed with the package which reads:</p>
343343
<blockquote>
344344
<p>“To administer medicine to animals is frequently a very difficult matter, and yet sometimes it’s necessary to do so”</p>
@@ -378,16 +378,30 @@ <h3>Word transcripts</h3>
378378
<a class="sourceLine" id="cb3-12" data-line-number="12"><span class="co">#4 0.700s 0.900s A</span></a>
379379
<a class="sourceLine" id="cb3-13" data-line-number="13"><span class="co">#5 0.900s 1s Dream</span></a></code></pre></div>
380380
</div>
381+
<div id="custom-configurations" class="section level2">
382+
<h2>Custom configurations</h2>
383+
<p>You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a <a href="https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig"><code>RecognitionConfig</code></a> object. This can be done via R lists which are converted to JSON via <code>library(jsonlite)</code> and an example is shown below:</p>
384+
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1"><span class="co">## Use a custom configuration</span></a>
385+
<a class="sourceLine" id="cb4-2" data-line-number="2">my_config &lt;-<span class="st"> </span><span class="kw">list</span>(<span class="dt">encoding =</span> <span class="st">&quot;LINEAR16&quot;</span>,</a>
386+
<a class="sourceLine" id="cb4-3" data-line-number="3"> <span class="dt">diarizationConfig =</span> <span class="kw">list</span>(</a>
387+
<a class="sourceLine" id="cb4-4" data-line-number="4"> <span class="dt">enableSpeakerDiarization =</span> <span class="ot">TRUE</span>,</a>
388+
<a class="sourceLine" id="cb4-5" data-line-number="5"> <span class="dt">minSpeakerCount =</span> <span class="dv">2</span>,</a>
389+
<a class="sourceLine" id="cb4-6" data-line-number="6"> <span class="dt">maxSpeakCount =</span> <span class="dv">3</span></a>
390+
<a class="sourceLine" id="cb4-7" data-line-number="7"> ))</a>
391+
<a class="sourceLine" id="cb4-8" data-line-number="8"></a>
392+
<a class="sourceLine" id="cb4-9" data-line-number="9"><span class="co"># languageCode is required, so will be added if not in your custom config</span></a>
393+
<a class="sourceLine" id="cb4-10" data-line-number="10"><span class="kw">gl_speech</span>(my_audio, <span class="dt">languageCode =</span> <span class="st">&quot;en-US&quot;</span>, <span class="dt">customConfig =</span> my_config)</a></code></pre></div>
394+
</div>
381395
<div id="asynchronous-calls" class="section level2">
382396
<h2>Asynchronous calls</h2>
383397
<p>For speech files greater than 60 seconds of if you don’t want your results straight away, set <code>asynch = TRUE</code> in the call to the API.</p>
384398
<p>This will return an object of class <code>&quot;gl_speech_op&quot;</code> which should be used within the <code>gl_speech_op()</code> function to check the status of the task. If the task is finished, then it will return an object the same form as the non-asynchronous case.</p>
385-
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1">async &lt;-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a>
386-
<a class="sourceLine" id="cb4-2" data-line-number="2">async</a>
387-
<a class="sourceLine" id="cb4-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a>
388-
<a class="sourceLine" id="cb4-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a>
389-
<a class="sourceLine" id="cb4-5" data-line-number="5"></a>
390-
<a class="sourceLine" id="cb4-6" data-line-number="6">result &lt;-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div>
399+
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1">async &lt;-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a>
400+
<a class="sourceLine" id="cb5-2" data-line-number="2">async</a>
401+
<a class="sourceLine" id="cb5-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a>
402+
<a class="sourceLine" id="cb5-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a>
403+
<a class="sourceLine" id="cb5-5" data-line-number="5"></a>
404+
<a class="sourceLine" id="cb5-6" data-line-number="6">result &lt;-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div>
391405
</div>
392406

393407

vignettes/text-to-speech.Rmd

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,34 @@ gl_talk("Would you like a cup of tea?", gender = "FEMALE", languageCode = "en-GB
8080

8181
Some languages are not yet supported, such as Danish. The API will return an error in those cases.
8282

83+
## Support for SSML
84+
85+
Support is also included for Speech Synthesis Markup Language (SSML) - more details on using this to insert pauses, sounds and breaks in your audio can be found here: `https://cloud.google.com/text-to-speech/docs/ssml`
86+
87+
To use, send in your SSML markup around the text you want to talk and set `inputType= "ssml"`:
88+
89+
```r
90+
# using SSML
91+
gl_talk('<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
92+
standard <break time=\"1s\"/>is defined by the
93+
<sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>',
94+
inputType = "ssml")
95+
```
96+
97+
## Effect Profiles
98+
99+
You can output audio files that are optimised for playing on various devices.
100+
101+
To use audio profiles, supply a character vector of the available audio profiles listed here: `https://cloud.google.com/text-to-speech/docs/audio-profiles` - the audio profiles are applied in the order given.
102+
103+
For instance `effectsProfileIds="wearable-class-device"` will optimise output for smart watches, `effectsProfileIds=c("wearable-class-device","telephony-class-application")` will apply sound filters optimised for smart watches, then telephonic devices.
104+
105+
```r
106+
# using effects profiles
107+
gl_talk("This sounds great on headphones",
108+
effectsProfileIds = "headphone-class-device")
109+
```
110+
83111
## Browser Speech player
84112

85113
Creating and clicking on the audio file to play it can be a bit of a drag, so you also have a function that will play the audio file for you, launching via the browser. This can be piped via the tidyverse's `%>%`

0 commit comments

Comments
 (0)