SINGLE AND MULTI-CHANNEL APPROACHES FOR DISTANT SPEECH RECOGNITION UNDER NOISY REVERBERANT CONDITIONS: I2R'S SYSTEM DESCRIPTION FOR THE ASpIRE CHALLENGE

被引:0
作者
Dennis, Jonathan [1 ]
Tran Huy Dat [1 ]
机构
[1] ASTAR, Inst Infocomm Res, 1 Fusionopolis Way, Singapore 138632, Singapore
来源
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2015年
关键词
ASpIRE Challenge; mismatched conditions; reverberation; distant speech recognition; beamforming;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce the system developed at the Institute for Infocomm Research ((IR)-R-2) for the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. The main components of the system are a front-end processing system consisting of a distributed beamforming algorithm, that performs adaptive weighting and channel elimination, a speech dereverberation approach using a maximum-kurtosis criteria, and a robust voice activity detection (VAD) module based on using the sub-harmonic ratio (SHR). The acoustic back-end consists of a multi-conditional Deep Neural Network (DNN) model that uses speaker adapted features combined with a decoding strategy that performs semi-supervised DNN model adaptation using weighted labels generated by the first-pass decoding output. On the single-microphone evaluation, our system achieved a word error rate (WER) of 44.8%. With the incorporation of beamforming on the multi-microphone evaluation, our system achieved an improvement in WER of over 6% to give the best evaluation result of 38.5%.
引用
收藏
页码:518 / 524
页数:7
相关论文
共 19 条
[1]   Acoustic beamforming for speaker diarization of meetings [J].
Anguera, Xavier ;
Wooters, Chuck ;
Hernando, Javier .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2011-2022
[2]  
[Anonymous], 2011, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, DOI DOI 10.1111/J.1096-3642.2009.00621.X
[3]  
[Anonymous], 2012, STAT LANGUAGE MODELS
[4]  
Carnegie Mellon University, 2015, CARN MELL U PRON DIC
[5]  
Cieri C., 2004, LREC, P69
[6]  
Dat Tran Huy, 2004, ISCA TUT RES WORKSH
[7]  
Gillespie BW, 2001, INT CONF ACOUST SPEE, P3701, DOI 10.1109/ICASSP.2001.940646
[8]  
Harper M, 2015, AUTOMATIC SPEECH REC, P1
[9]   GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY [J].
KNAPP, CH ;
CARTER, GC .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04) :320-327
[10]  
Liao H, 2013, INT CONF ACOUST SPEE, P7947, DOI 10.1109/ICASSP.2013.6639212