getVOT is a developmental R package with functions that aim to predict
positive or negative voice onset time from bare or annotated sound
files. VOT prediction itself is done by loading (partial) sound files
into R and using largely base R functions to predict the usual VOT
landmarks from a vector of audio samples. Options are available to
generate Praat TextGrids or add a tier
to existing TextGrids with predicted VOT, or to add an annotation level
with predicted VOT to an existing EMU
database (although the latter
functionality is significantly slower).
The goal is to partially make automatic, or at least aid in, the job of
annotating VOT. There are other and more sophisticated algorithms
available for predicting VOT, including
AutoVOT and
Dr.VOT. The job of getVOT is not
to replace or even necessarily outperform these. AutoVOT only predicts
positive VOT and scales best with a fair amount of training data; Dr.VOT
predicts both positive and negative VOT, but can be a bit daunting to
set up and difficult to manipulate; both can only be used on Unix-style
operating systems. getVOT should be easy to set up and get started
with, should be compatible with all operating systems, and should scale
fairly well with little to no training data.* This makes getVOT
optimal for smaller data sets where providing suitable training data is
difficult.
A big motivation for creating getVOT was also as a coding exercise. I
felt that it should be possible to find the regular VOT landmarks
using a relatively simple algorithm with few bells and whistles, and I
have personally been rather happy with the results. Because I know
exactly what getVOT does under the hood, it’s also usually
straightforward to figure out why it fails when it fails.
*This is a beta version of getVOT! If you find that any of these
claims do not match your experience, I’d be very happy to hear about it!
You can reach out at r.puggaard at phonetik.uni-muenchen.de.
You can install the development version of getVOT from GitHub with:
#install.packages('devtools')
devtools::install_github('rpuggaardrode/getVOT')I give a few examples below of how the main functions of getVOT work.
There is a longer tutorial on
LingMethodsHub,
which also gets into the nitty-gritty of how it works under the hood.
Probably the two most important getVOT functions in daily use will be
VOT2newTG and addVOT2TG. As the names suggest, VOT2newTG will
generate a new TextGrid with predicted VOT where no annotations exist
already. In this case, sound files should be short and consist of only a
single word beginning with a stop. addVOT2TG adds a new tier to an
existing TextGrid with predicted VOT. In this case sound files can be
short, but the existing TextGrid should indicate the rough location of
stops.
VOT2newTG works like this:
VOT2newTG(directory='my_directory', sign='positive')my_directory should be a directory of sound files. The sign argument
specifies whether positive or negative VOT should be predicted. This
argument is optional; the default is to use a simple algorithm to try to
predict whether VOT is positive or negative, but this algorithm is not
all that precise (yet).
The function will create a new subdirectory of my_directory containing
TextGrids with VOT annotations.
addVOT2TG works like this:
addVOT2TG(directory='my_directory', tg_tier='my_word_tier',
seg_list=c('p', 't', 'k'), sign='positive')In this case, my_directory should be a directory of sound-TextGrid
pairs. tg_tier gives the name of the TextGrid tier where the rough
locations of stops are indicated. seg_list is a list of stop symbols
to look for in my_word_tier; it does not need to be a precise match,
in this case VOT will be predicted for all intervals of my_word_tier
where the first symbol is p, t, or k, ignoring stress symbols.
VOT2newTG and addVOT2TG also take the optional arguments
pos_params_list and neg_params_list; these are lists of some (rather
obscure) parameters that are used to tweak VOT prediction. Under the
hood, these parameters are passed on to the functions positiveVOT and
negativeVOT. These are the default arguments:
args(getVOT::positiveVOT)
#> function (sound, sr, closure_interval = 10, release_param = 15,
#> vo_method = "acf", vo_granularity = 1, vo_param = 0.85, f0_wl = 30,
#> f0_minacf = 0.5, burst_only = FALSE, f0_first = FALSE, plot = TRUE,
#> params_list = NULL)
#> NULL
args(getVOT::negativeVOT)
#> function (sound, sr, vo_method = "acf", closure_interval = 10,
#> vo_granularity = 1.2, vo_param = 0.9, f0_wl = 50, f0_minacf = 0.5,
#> vo_only = FALSE, rel_method = "transient", plot = TRUE, params_list = NULL)
#> NULLMore info about each of these can be found in the respective help files,
?positiveVOT and ?negativeVOT. Mileage may vary with the defaults –
they were chosen because I have found that they usually work fairly
well, but no settings of these parameters will give equally good results
for all data. Parameter lists can be edited by hand, but getVOT also
offers the functions pos_setParams and neg_setParams which will help
optimize parameters for your data.
In order to use pos_setParams and neg_setParams you should provide a
few (say, 5-10) representative examples of sound file-TextGrid pairs,
where the TextGrid has a single tier with manual VOT annotation. The
setParams functions will then try out a range of different parameter
settings, and return a list with the settings that are found to minimize
the difference between predicted VOT and manually labeled VOT.
You simply need to pass the functions a directory, like so:
optimized_params <- pos_setParams(directory='my_training_data')The resulting list optimized_params can then be passed onto functions
like VOT2newTG like so:
VOT2newTG(directory='my_directory', sign='positive',
pos_params_list=optimized_params)