Adaptively reserved likelihood ratio-based robust voice activity detection with sub-band double features

被引:0
作者
He W. [1 ]
He Q. [1 ]
Wu J. [1 ]
Yang J. [1 ]
机构
[1] School of Electronic and Information Engineering, South China University of Technology, Guangzhou
来源
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology | 2016年 / 38卷 / 11期
基金
中国国家自然科学基金;
关键词
Likelihood ratio; Low signal noise ratio; Sub-band zero crossing rate; Voice Activity Detection (VAD);
D O I
10.11999/JEIT160157
中图分类号
学科分类号
摘要
In order to improve the correct rate of Voice Activity Detection (VAD) in low Signal Noise Ratio (SNR) environment, the paper presents an adaptive reserved likelihood ratio VAD method, which is based on sub-band double features. The method employs sub-band auto correlate function and sub-band zero crossing rate in the process of setting reserved weight. Reserved threshold is estimated adaptively according to the passed VAD results and their sub-band feature parameters. The experiment shows its promising performance in comparison with similar algorithms, the VAD correct rate is improved by 1.2%, 7.2%, and 8.1% respectively in 10 dB, 0 dB, and -10 dB stationary white noisy environment, 1.6% and 3.4% respectively in 10 dB and 0 dB non-stationary Babble noisy environment. The method is also applied to 2.4 kbps low bit rate vocoder and the Perceptual Evaluation of Speech Quality (PESQ) is improved by 0.098~0.153 in white noisy environment, 0.157~0.186 in Babble noisy environment. © 2016, Science Press. All right reserved.
引用
收藏
页码:2879 / 2886
页数:7
相关论文
共 19 条
[1]  
Sreekumar K.T., George K.K., Arunraj K., Et al., Spectral matching based voice activity detector for improved speaker recognition, 2014 International Conference on Power Signals Control and Computations (EPSCICON), pp. 1-4, (2014)
[2]  
Duta C.L., Gheorghe L., Tapus N., Real time implementation of MELP speech compression algorithm using Blackfin processors, 2015 9th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 250-255, (2015)
[3]  
Chul Y.I., Hyeontaek L., Dongsuk Y., Formant-based robust voice activity detection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 12, pp. 2238-2245, (2015)
[4]  
Jongseo S., Nam Soo K., Wonyong S., A statistical model-based voice activity detection, IEEE Signal Processing Letters, 6, 1, pp. 1-3, (1999)
[5]  
Duk C.Y., Al-Naimi K., Kondoz A., Improved voice activity detection based on a smoothed statistical likelihood ratio, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 737-740, (2001)
[6]  
Ramirez J., Segura J., Benitez C., Et al., Statistical voice activity detection using a multiple observation likelihood ratio test, IEEE Signal Process Letters, 12, 10, pp. 689-692, (2005)
[7]  
Ramirez J., Segura J.C., Gorriz J.M., Et al., Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, 15, 8, pp. 2177-2189, (2007)
[8]  
Ick K.S., Haing J.Q., Hyuk C.J., Discriminative weight training for a statistical model-based voice activity detection, IEEE Signal Processing Letters, 15, pp. 170-173, (2008)
[9]  
Youngjoo S., Hoirin K., Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection, Signal Processing Letters, 19, 8, pp. 507-510, (2012)
[10]  
Ferroni G., Bonfigli R., Principi E., Et al., A deep neural network approach for voice activity detection in multi-room domestic scenarios, 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, (2015)