An effective cluster-based model for robust speech detection and speech recognition in noisy environments

被引:21
作者
Gorriz, J. M. [1 ]
Ramirez, J.
Segura, J. C.
Puntonet, C. G.
机构
[1] Univ Granada, Dept Signal Theory, Granada, Spain
[2] Univ Granada, Dept Comp Architecture & Technol, Granada, Spain
关键词
D O I
10.1121/1.2208450
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms. (c) 2006 Acoustical Society of America.
引用
收藏
页码:470 / 481
页数:12
相关论文
共 36 条
[1]  
Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
[2]  
[Anonymous], 2001, The elements of statistical learning: data mining, inference and prediction
[3]  
ARMANI L, 2003, P EUROSPEECH 2003 GE, P501
[4]   Noise reduction and echo cancellation front-end for speech codecs [J].
Basbug, F ;
Swaminathan, K ;
Nandkumar, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (01) :1-13
[5]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[6]   Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors [J].
Beritelli, F ;
Casale, S ;
Ruggeri, G ;
Serrano, S .
IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (03) :85-88
[7]  
Chengalvarayan R., 1999, P EUROSPEECH 1999 BU, P61
[8]  
Cho YD, 2001, IEEE SIGNAL PROC LET, V8, P276, DOI 10.1109/97.957270
[9]  
Fisher D. H., 1987, Machine Learning, V2, P139, DOI 10.1007/BF00114265
[10]   A soft voice activity detector based on a Laplacian-Gaussian model [J].
Gazor, S ;
Zhang, W .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :498-505