I am trying to adapt this script to detect voice-silence segments in an audio file containing source separated singing voice signal obtained from http://github.com/sigsep/open-unmix-pytorch
I have some questions:
-
Does it make sense to compute the threshold for each data_window independently? Instead of having a fixed speech_energy_threshold? I would do that by computing the energy of the data_window signal, normalizing it and taking its mean value. If this value is = 0.0, I can label that segment as silence.
-
Is there a clever way to choose parameters like sample_window, sample_overlap, speech_window that would be more appropriate for singing voice signals?
Thanks a lot!
I am trying to adapt this script to detect voice-silence segments in an audio file containing source separated singing voice signal obtained from http://github.com/sigsep/open-unmix-pytorch
I have some questions:
Does it make sense to compute the threshold for each
data_windowindependently? Instead of having a fixedspeech_energy_threshold? I would do that by computing the energy of the data_window signal, normalizing it and taking its mean value. If this value is = 0.0, I can label that segment as silence.Is there a clever way to choose parameters like
sample_window,sample_overlap,speech_windowthat would be more appropriate for singing voice signals?Thanks a lot!