An effective subband OSF-based VAD with noise reduction for robust speech recognition

被引:54
作者
Ramírez, J [1 ]
Segura, JC [1 ]
Benítez, C [1 ]
de la Torre, A [1 ]
Rubio, A [1 ]
机构
[1] Univ Granada, Dept Signal Theory Networking & Commun, E-18071 Granada, Spain
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 06期
关键词
noise reduction; robust speech recognition; speech/nonspeech detection; subband order statistics filters;
D O I
10.1109/TSA.2005.853212
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
An effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on the determination of the speech/nonspeech divergence by means of specialized order statistics filters (OSFs) working on the subband log-energies. This algorithm differs from many others in the way the decision rule is formulated. Instead of making the decision based on the current frame, it uses OSFs on the subband log-energies which significantly reduces the error probability when discriminating speech from nonspeech in a noisy signal. Clear improvements in speech/nonspeech discrimination accuracy demonstrate the effectiveness of the proposed VAD. It is shown that an increase of the OSF order leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The algorithm also incorporates a noise reduction block working in tandem with the VAD and showed to further improve its accuracy. A previous noise reduction block also improves the accuracy in detecting speech and nonspeech. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR, and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.
引用
收藏
页码:1119 / 1129
页数:11
相关论文
共 33 条
[1]  
ARCE J, 1986, ADV COMPUTER VISION, V2
[2]  
ARMANI L, 2003, P EUROSPEECH 2003 GE, P501
[3]  
Astola J., 1997, Fundamentals of Nonlinear Digital Filtering, DOI DOI 10.1201/9781003067832
[4]   Noise reduction and echo cancellation front-end for speech codecs [J].
Basbug, F ;
Swaminathan, K ;
Nandkumar, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (01) :1-13
[5]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[7]  
Chengalvarayan R., 1999, P EUROSPEECH 1999 BU, P61
[8]  
Cho YD, 2001, IEEE SIGNAL PROC LET, V8, P276, DOI 10.1109/97.957270
[9]   NONPARAMETRIC RANK-ORDER STATISTICS APPLIED TO ROBUST VOICED-UNVOICED-SILENCE CLASSIFICATION [J].
COX, BV ;
TIMOTHY, LMK .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (05) :550-561
[10]  
David H. A., 2003, Order statistics, V3rd