Robust voice activity detection directed by noise classification

被引:9
作者
Saeedi, Jamal [1 ]
Ahadi, Seyed Mohammad [1 ]
Faez, Karim [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
关键词
Voice activity detection; Perceptual wavelet packet transform; Noise classification; Support vector machine; ALGORITHMS; TRANSFORM;
D O I
10.1007/s11760-013-0479-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper voice activity detection (VAD) is formulated as a two-class classification problem using support vector machines (SVM). The proposed method combines a noise robust speech processing feature extraction process together with SVM models trained in different background noises for speech/non-speech classification. A multi-class SVM is also used to classify background noises in order to select SVM model for VAD. The proposed VAD is tested with TIMIT data artificially distorted by different additive noise types and is compared with state-of-the-art VADs. Experimental results show that the proposed VAD can extract speech activity under poor SNR conditions, and it is also insensitive to variable levels of noise.
引用
收藏
页码:561 / 572
页数:12
相关论文
共 22 条
[1]  
[Anonymous], 1993, FUNDAMENTAL SPEECH R
[2]  
[Anonymous], 1993, TECHNICAL REPORT
[3]  
Beritelli F, 2001, INT CONF ACOUST SPEE, P1425, DOI 10.1109/ICASSP.2001.941197
[4]  
Chang CC, 2001, TECHNICAL REPORT
[5]   Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator [J].
Chen, Shi-Huang ;
Wu, Hsin-Te ;
Chang, Yukon ;
Truong, T. K. .
PATTERN RECOGNITION LETTERS, 2007, 28 (11) :1327-1332
[6]   ENTROPY-BASED ALGORITHMS FOR BEST BASIS SELECTION [J].
COIFMAN, RR ;
WICKERHAUSER, MV .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (02) :713-718
[7]  
Friedman J.H., 1996, TECHNICAL REPORT, P1
[8]   Towards improving speech detection robustness for speech recognition in adverse conditions [J].
Karray, L ;
Martin, A .
SPEECH COMMUNICATION, 2003, 40 (03) :261-276
[9]  
Kinnunen T., 2007, INT C SPEECH COMPUTE, V2, P556
[10]  
Madisetti V.K., 1999, DIGIT SIGNAL PROCESS