Voice Assistant Speaker Verification

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

Abstract

Utilizing transfer learning with the ECAPA-TDNN model trained on the VoxCeleb2 dataset.
Intra-voice assistant comparisons: Achieved accuracies of 83.33% (iOS) and 66.67% (Alexa) for text-independent samples and 50% for text-dependent samples.
Inter-voice assistant comparisons (Alexa, Siri, Google Assistant, Cortana): 100% accuracy for text-independent, 80% for text-dependent.
Demonstrates the effectiveness of transfer learning and ECAPA-TDNN model for secure speaker verification across speech assistant versions.
Valuable insights for enhancing speaker verification in the context of speech assistants.

Introduction

Speaker verification utilizes speech characteristics differentiated based on pitch, formants, spectral envelope, MFCCs, and prosody characteristics.
"Voice prints" represent a speaker's unique vocal qualities.
Two types of speaker verification methods: text-dependent and text-independent.
Transfer learning employs pre-trained models to improve performance when labeled data is scarce.
The ECAPA-TDNN model from the SpeechBrain toolkit is used in this study for transfer learning on virtual assistants.

Methodology

Dataset

A custom audio dataset was created with a subset selected for analysis.
Organized into:
- Intra-pair Comparisons:
  - Siri Versions (iOS 9 vs iOS 10 vs iOS 11)
  - Alexa Versions (3rd gen vs 4th gen vs 5th gen)
- Inter-pair Comparisons:
  - Alexa
  - Siri
  - Google
  - Cortana

SpeechBrain

Features the ECAPA-TDNN model, a state-of-the-art model for speaker recognition that uses TDNN design with MFA mechanism, Squeeze-Excitation (SE), and residual blocks.
Hyperparameters are detailed in a YAML format.
Data Loading makes use of a PyTorch dataset interface.
Batching includes extracting speech features like spectrograms and MFCCs.
Brain_class() simplifies the neural model training process.

Pre-trained Model: ECAPA-TDNN

SpeechBrain provides outputs using pre-trained models such as ECAPA-TDNN.

Data preprocessing: Extract 80-dimensional filterbank features.
Model initialization: 5 TDNN layers, an attention mechanism, and an MLP classifier.
Hyperparameter setting: epochs, batch size, learning rate, etc.
Training: Trained on the VoxCeleb2 dataset.
Validation and Testing: Evaluate on a validation set.

Implementation

- Normalize, denoise, and extract features from audio samples.

Adjust the ECAPA-TDNN model's initial layer for TDSV and TISV.
Use the model to verify speaker identities and obtain similarity scores.
Store scores and predictions in arrays.
Calculate accuracy, precision, F1 score, and recall for evaluation.

Result

Output Snippets

Conclusion

Intra-pair TDSV analysis shows similarities among all versions, leading to potential security concerns.
Inter-pair TDSV analysis found matches between Cortana & Google Assistant and Alexa.
TISV has higher accuracy than TDSV due to the model's capability to differentiate different texts.
For better performance, additional training on a broader dataset of synthetic voices is recommended.
The study emphasizes the potential of transfer learning and SpeechBrain for speaker verification, also acknowledging challenges with synthetic voices.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Alexa- 3rd gen vs 4th gen vs 5th gen.ipynb		Alexa- 3rd gen vs 4th gen vs 5th gen.ipynb
Alexa_vs_Siri_vs_Google_assistant_vs_Cortana.ipynb		Alexa_vs_Siri_vs_Google_assistant_vs_Cortana.ipynb
README.md		README.md
iOS 9 vs iOS 10 vs iOS 11.ipynb		iOS 9 vs iOS 10 vs iOS 11.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Assistant Speaker Verification

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

Abstract

Introduction

Methodology

Dataset

SpeechBrain

Pre-trained Model: ECAPA-TDNN

Implementation

Result

Output Snippets

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Assistant Speaker Verification

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

Abstract

Introduction

Methodology

Dataset

SpeechBrain

Pre-trained Model: ECAPA-TDNN

Implementation

Result

Output Snippets

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages