SINGLE AND MULTI-CHANNEL APPROACHES FOR DISTANT SPEECH RECOGNITION UNDER NOISY REVERBERANT CONDITIONS: I2R'S SYSTEM DESCRIPTION FOR THE ASpIRE CHALLENGE

被引:0
作者
Dennis, Jonathan [1 ]
Tran Huy Dat [1 ]
机构
[1] ASTAR, Inst Infocomm Res, 1 Fusionopolis Way, Singapore 138632, Singapore
来源
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2015年
关键词
ASpIRE Challenge; mismatched conditions; reverberation; distant speech recognition; beamforming;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce the system developed at the Institute for Infocomm Research ((IR)-R-2) for the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. The main components of the system are a front-end processing system consisting of a distributed beamforming algorithm, that performs adaptive weighting and channel elimination, a speech dereverberation approach using a maximum-kurtosis criteria, and a robust voice activity detection (VAD) module based on using the sub-harmonic ratio (SHR). The acoustic back-end consists of a multi-conditional Deep Neural Network (DNN) model that uses speaker adapted features combined with a decoding strategy that performs semi-supervised DNN model adaptation using weighted labels generated by the first-pass decoding output. On the single-microphone evaluation, our system achieved a word error rate (WER) of 44.8%. With the incorporation of beamforming on the multi-microphone evaluation, our system achieved an improvement in WER of over 6% to give the best evaluation result of 38.5%.
引用
收藏
页码:518 / 524
页数:7
相关论文
共 19 条
[11]   DISCRIMINATIVE TRAINING BASED ON AN INTEGRATED VIEW OF MPE AND MMI IN MARGIN AND ERROR SPACE [J].
McDermott, Erik ;
Watanabe, Shinji ;
Nakamura, Atsushi .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4894-4897
[12]  
Povey Daniel, 2011, IEEE WORKSH AUT SPEE
[13]  
Rui Y, 2004, INT CONF ACOUST SPEE, P133
[14]  
Sorin A., 2003, EXTENDED ADV FRONT A, V202, P212
[15]  
Sun XJ, 2002, INT CONF ACOUST SPEE, P333
[16]  
Tomar V., 2010, BLIND DEREVERBERATIO
[17]  
Vesely K, 2013, INTERSPEECH, P2344
[18]  
Vesely K, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P267, DOI 10.1109/ASRU.2013.6707741
[19]  
Zhang C, 2008, INT CONF ACOUST SPEE, P2565