A noise-robust speech recognition approach incorporating normalized speech/non-speech likelihood into hypothesis scores

被引:3
作者
Oonishi, Tasuku [1 ]
Iwano, Koji [2 ]
Furui, Sadaoki [1 ]
机构
[1] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
[2] Tokyo City Univ, Tsuzuki Ku, Yokohama, Kanagawa 2248551, Japan
关键词
Speech recognition; Noise robustness; Gaussian mixture model adaptation;
D O I
10.1016/j.specom.2012.10.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
]In noisy environments, speech recognition decoders often incorrectly produce speech hypotheses for non-speech periods, and non-speech hypotheses, such as silence or a short pause, for speech periods. It is crucial to reduce such errors to improve the performance of speech recognition systems. This paper proposes an approach using normalized speech/non-speech likelihoods calculated using adaptive speech and non-speech GMMs to weight the scores of recognition hypotheses produced by the decoder. To achieve good decoding performance, the GMMs are adapted to the variations of acoustic characteristics of input utterances and environmental noise, using either of the two modern on-line unsupervised adaptation methods, switching Kalman filter (SKF) or maximum a posteriori (MAP) estimation. Experimental results on real-world in-car speech, the Drivers' Japanese Speech Corpus in a Car Environment (DJSC), and the AURORA-2 database show that the proposed method significantly improves recognition accuracy compared to a conventional approach using front-end voice activity detection (VAD). Results also confirm that our method significantly improves recognition accuracy under various noise and task conditions. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 17 条
[1]  
[Anonymous], THESIS CAMBRIDGE U
[2]  
[Anonymous], 2002, ETSI ES
[3]   The titech large vocabulary WFST speech recognition system [J].
Dixon, Paul R. ;
Caseiro, Diamantino A. ;
Oonishi, Tasuku ;
Furui, Sadaoki .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :443-+
[4]  
Fujimoto M, 2007, INT CONF ACOUST SPEE, P797
[5]  
Fujimoto Masakiyo, 2007, P INT 07 AUG, P2933
[6]  
HIRAKI K, 2008, IEICE TECHNICAL REPO, P93
[7]  
Itou K., 1999, Journal of the Acoustical Society of Japan (E), V20, P199, DOI 10.1250/ast.20.199
[8]  
Iwano Koji, 2002, P ICSLP, P941
[9]  
Metze F., 2002, PROC ICSLP DENVER, P2133
[10]  
Oonishi T, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P3122