Speech waveform compression using robust adaptive voice activity detection for nonstationary noise
Document Type
Article
Publication Date
5-26-2008
Abstract
The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this paper, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold, which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and-spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple of existing representative schemes, namely, the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noises, namely, white noise (stationary), babble noise (nonstationary), and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating characteristics (ROC) curves. Finally, we demonstrate one of the major applications, namely, speech waveform compression associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency.
Publication Source (Journal or Book title)
Eurasip Journal on Audio, Speech, and Music Processing
Recommended Citation
Wu, H., & Syed, W. (2008). Speech waveform compression using robust adaptive voice activity detection for nonstationary noise. Eurasip Journal on Audio, Speech, and Music Processing, 2008 https://doi.org/10.1155/2008/639839