author: steeve LAQUITAINE
date: 28/08/2021
Short description: Voice activity detection is critical to reduce the computational cost of continuously monitoring large volume of speech data necessary to swiftly detect command utterances such as wakewords. My objective was to code a Voice Activity Detector (VAD) with reasonable performances (Low false rejection rate) based on a neural network within a week and with low computing resources. I trained and tested the model on labelled audio data containing speech from diverse speakers including male, female, synthetic, low and low volume, slow and fast space speech properties. The dataset came from LibriSpeech and was prepared and provided by SONOS. I used a variety of tools to extract, preprocess and develop and test the model but I mostly relied on Tensorflow advanced Subclassing api, tensorflow ops and Keras, Tensorboard, Seaborn and the more classical matplotlib visualization tools to make sense of the data, clean the data and inspect the inner workings of the model.
notebooks/report.pdf
notebooks/report.ipynb
Prerequisites installations :
You have:
conda 4.8.3(which condain a terminal).- you have
Git 2.32.0
You can get and run the codebase in 3 steps:
- Setup:
git clone https://github.com/slq0/vad_deepnet.git
cd vad_deepnet
conda create -n vad python==3.6.13
conda activate vad
pip install kedro==0.17.4
bash setup.sh-
Move the dataset to
vad_deepnet/data/01_raw/ -
Run basic model training (takes 30min) and predict-eval (20 secs):
kedro run --pipeline train --env train
kedro run --pipeline predict_and_eval --env predict_and_eval-
Development:
VSCODE: coding in an integrated development environmentConda: isolate environment and manage dependenciesGit: code versioningGithub: centralize repo for collaborationKedro: standardize codebase
-
Experiment tracking & reproducibility:
mlflow: pipeline parameters & model experiment trackingTensorboard: model experiment inspection & optimizationgit-graph: keep track of flow of commit and branches
-
Readability:
black: codebase formattingpylint: codebase linting
-
Test coverage:
pytest: minimal unit-tests
Create conda environment, install python and kedro for codebase standardization.
conda create -n vad python==3.6.13 kedro==0.17.4- I used a light version of the
Gitflow Workflowmethodology for code versioning and collaboration. - A
Masterbranch will be ourproductionbranch (final deployment): - I created and moved to a
Developbranch and branched out afeaturebranch to start developing- The
Developbranch would hypothetically be anintegrationbranch (for continuous integration and testing)
- The
- I kept track of my commits and the workflow of branches with
git-graph
git clone https://github.com/slq0/vad_deepnet.gitRun this bash script to build and install the project's dependencies:
bash setup.shTrain the basic model:
Run the training pipeline:
kedro run --pipeline train --env trainRun inference with the model:
kedro run --pipeline predict --env predictEvaluate its performance metrics:
kedro run --pipeline predict_and_eval --env predict_and_evalVisualize layers' weights, biases across epochs, training and validation loss, performance metrics on validation, the model's conceptual and structural graph to dive into decreasing levels of abstraction.
The model runs are logged in tbruns/.
tensorboard --logdir tbruns
# http://localhost:6006/I used mlflow to track experiments and tested hyperparameter runs (e.g., run duration).
The logs are stored in mlruns/.
kedro mlflow ui --env train --host 127.0.0.1 --port 6007
# http://localhost:6007/To keep track of the pipeline and optimize it, I used Kedro-viz which described the pipelines with Directed Acyclic graphs:
kedro viz
# http://127.0.0.1:4141Run unit-tests on the code base. I initialized unit-tests but did not have to implement more than one test. You can run unit-tests with:
pytest src/tests/test_run.pyYou can open the package's Sphynx documentation by opening docs/build/html/index.html
in your web browser (double click on the file):
kedro build-docs --openWe can use the pure speech and noise corpora below for speech vs. silence classes. We can also augment pure speech dataset by adding noisy speech data created by summing speech and noise data.
TIMITcorpus for clean speech (1)- license: ([TODO]: check)
QUT- NOISE: corpus of noise (1)- license: CC-BY-SA ([TODO]: check)
The final report is notebook/report.pdf with a collapsible table of content
(see in preview for mac and adobe reader on windows)
To format the .ipynb report into a .pdf run in the terminal :
jupyter nbconvert notebooks/report.ipynb --to pdf --no-inputnote: references are formatted according to the Amercian Psychological Association (APA) style
(1) Dean, D., Sridharan, S., Vogt, R., & Mason, M. (2010). The QUT-NOISE-TIMIT corpus for evaluation of voice activity detection algorithms. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (pp. 3110-3113). International Speech Communication Association.