VAD on singing voice?

I am trying to adapt this script to detect voice-silence segments in an audio file containing source separated singing voice signal obtained from [http://github.com/sigsep/open-unmix-pytorch](https://github.com/sigsep/open-unmix-pytorch)

I have some questions:

* Does it make sense to compute the threshold for each `data_window` independently? Instead of having a fixed `speech_energy_threshold`? I would do that by computing the energy of the data_window signal, normalizing it and taking its mean value. If this value is = 0.0, I can label that segment as silence.

* Is there a clever way to choose parameters like `sample_window`, `sample_overlap`, `speech_window` that would be more appropriate for singing voice signals?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAD on singing voice? #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

VAD on singing voice? #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions