Noise robust model-based Voice Activity Detection

被引:0
作者
de la Torre, Angel [1 ]
Ramirez, Javier [1 ]
Benitez, Carmen [1 ]
Segura, Jose C. [1 ]
Garcia, Luz [1 ]
Rubio, Antonio J. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
voice activity detection (VAD); vector Taylor series approach (VTS); Gaussian mixture; Wiener filtering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a model-based VAD derived from the Vector Taylor Series (VTS) approach. A Gaussian mixture (trained with clean speech) is used in order to provide an appropriate decision rule for speech/non-speech detection. Additionally, VTS approach adapts the Gaussian mixture to noise conditions, yielding a stable performance for a wide range of SNRs. We have evaluated its ability for speech/non-speech detection and also its application for robust speech recognition. When compared to other VAD methods, the proposed VAD shows the best trade-off in speech/non-speech detection. When applied for Wiener Filtering and for frame dropping, the proposed VAD also provides the best recognition results.
引用
收藏
页码:1954 / 1957
页数:4
相关论文
共 12 条
[1]  
[Anonymous], 1996, THESIS CARNEGIE MELL
[2]   Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors [J].
Beritelli, F ;
Casale, S ;
Ruggeri, G ;
Serrano, S .
IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (03) :85-88
[3]  
HIRSCH H, 2000, ISCA ITRW ASR2000 AU
[4]   Robust endpoint detection and energy normalization for real-time speech and speaker recognition [J].
Li, Q ;
Zheng, JS ;
Tsai, A ;
Zhou, QR .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03) :146-157
[5]  
MARZINZIK M, 2002, IEEE T SPEECH AUDIO, V10, P341, DOI DOI 10.1109/TSA.2002.803420
[6]  
Moreno PJ, 1996, INT CONF ACOUST SPEE, P733, DOI 10.1109/ICASSP.1996.543225
[7]   An effective subband OSF-based VAD with noise reduction for robust speech recognition [J].
Ramírez, J ;
Segura, JC ;
Benítez, C ;
de la Torre, A ;
Rubio, A .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (06) :1119-1129
[8]   Efficient voice activity detection algorithms using long-term speech information [J].
Ramírez, J ;
Segura, JC ;
Benítez, C ;
de la Torre, A ;
Rubio, A .
SPEECH COMMUNICATION, 2004, 42 (3-4) :271-287
[9]  
Segura J. C., 2001, P EUROSPEECH 01, VI, P221
[10]   A statistical model-based voice activity detection [J].
Sohn, J ;
Kim, NS ;
Sung, W .
IEEE SIGNAL PROCESSING LETTERS, 1999, 6 (01) :1-3