Adaptive Voice Activity Detection Based on Long-Term Information

被引:0
作者
Yang X.-K. [1 ]
Qu D. [1 ]
Zhang W.-L. [1 ]
Yan H.-G. [1 ]
机构
[1] PLA Information Engineering University, Zhengzhou, 450001, Henan
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2018年 / 46卷 / 04期
关键词
Adaptive; Auditory filter bank; Long-term information; Voice activity detection;
D O I
10.3969/j.issn.0372-2112.2018.04.016
中图分类号
学科分类号
摘要
The long-term information of speech signals shows excellent performances in the applications of voice activity detection. Six types of long-term information based on auditory filter banks are proposed through the non-linear spectral decomposition with three different auditory filters. Further, an adaptive voice activity detection algorithm based on these types of long-term information is proposed. Without additional training data, this algorithm use the data selecting from the test signals according to long-term information to train a speech/non-speech classifier, and classifies the current test signals using the speech/non-speech classifier frame by frame. Experiments on TIMIT dataset and NOISEX-92 dataset show that the algorithm improves the performance of VAD with higher accuracy and stronger robustness in low SNR noisy environments. The online experiments show that it can also obtain a good performance in real-time processing conditions. © 2018, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:878 / 885
页数:7
相关论文
共 21 条
[1]  
Ramirez J., Gorriz J.-M., Segura J.-C., Voice activity detection, fundamentals and speech recognition system robustness, robust speech recognition and understanding
[2]  
Wisdom S., Okopal G., Atlas L., Pitton J., Voice activity detection using subband noncircularity, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4505-4509, (2015)
[3]  
Heese F., Niermann M., Vary P., Speech-codebook based soft voice activity detection, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4335-4339, (2015)
[4]  
Tao F.-J., Hansen H.-L., Busso C., An unsupervised visual-only voice activity detection approach using temporal orofacial features, Proceedings of 16th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2302-2306, (2015)
[5]  
Zhan G., Huang Z.-Q., Et al., Spectrographic speech mask estimation using the time-frequency correlation of speech presence, Proceedings of 16th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2287-2291, (2015)
[6]  
Ramirez J., Segura J.-C., Benitez C., Et al., Efficient voice activity detection algorithms using long-term speech information, Speech Communication, 42, 3, pp. 271-287, (2004)
[7]  
Ghosh P.-K., Tsiartas A., Narayanan S., Robust voice activity detection using long-term signal variability, IEEE Transactions on Audio, Speech, and Language Processing, 19, 3, pp. 600-613, (2011)
[8]  
Ma Y., Nishihara A., Efficient voice activity detection algorithm using long-term spectral flatness measure, EURASIP Journal on Audio, Speech and Music Processing, (2013)
[9]  
Yang X.-K., He L., Qu D., Zhang W.-Q., Voice activity detection algorithm based on long-term pitch information, EURASIP Journal on Audio, Speech, and Music Processing, (2016)
[10]  
Davis S., Mermelstein P., Comparison of parametric representations for monosyllabic word recognitions in continuously spoken sentences, IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 4, pp. 357-366, (1980)