SINGLE AND MULTI-CHANNEL APPROACHES FOR DISTANT SPEECH RECOGNITION UNDER NOISY REVERBERANT CONDITIONS: I2R'S SYSTEM DESCRIPTION FOR THE ASpIRE CHALLENGE

被引：0

作者：

Dennis, Jonathan ^{[1
]}

Tran Huy Dat ^{[1
]}

机构：

[1] ASTAR, Inst Infocomm Res, 1 Fusionopolis Way, Singapore 138632, Singapore

来源：

2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2015年

关键词：

ASpIRE Challenge; mismatched conditions; reverberation; distant speech recognition; beamforming;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we introduce the system developed at the Institute for Infocomm Research ((IR)-R-2) for the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. The main components of the system are a front-end processing system consisting of a distributed beamforming algorithm, that performs adaptive weighting and channel elimination, a speech dereverberation approach using a maximum-kurtosis criteria, and a robust voice activity detection (VAD) module based on using the sub-harmonic ratio (SHR). The acoustic back-end consists of a multi-conditional Deep Neural Network (DNN) model that uses speaker adapted features combined with a decoding strategy that performs semi-supervised DNN model adaptation using weighted labels generated by the first-pass decoding output. On the single-microphone evaluation, our system achieved a word error rate (WER) of 44.8%. With the incorporation of beamforming on the multi-microphone evaluation, our system achieved an improvement in WER of over 6% to give the best evaluation result of 38.5%.

引用

页码：518 / 524

页数：7

共 19 条

[1] Acoustic beamforming for speaker diarization of meetings [J].

Anguera, Xavier ;

Wooters, Chuck ;

Hernando, Javier .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :2011-2022

[2]

[Anonymous], 2011, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, DOI DOI 10.1111/J.1096-3642.2009.00621.X

[3]

[Anonymous], 2012, STAT LANGUAGE MODELS

[4]

Carnegie Mellon University, 2015, CARN MELL U PRON DIC

[5]

Cieri C., 2004, LREC, P69

[6]

Dat Tran Huy, 2004, ISCA TUT RES WORKSH

[7]

Gillespie BW, 2001, INT CONF ACOUST SPEE, P3701, DOI 10.1109/ICASSP.2001.940646

[8]

Harper M, 2015, AUTOMATIC SPEECH REC, P1

[9] GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY [J].

KNAPP, CH ;

CARTER, GC .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04) :320-327

[10]

Liao H, 2013, INT CONF ACOUST SPEE, P7947, DOI 10.1109/ICASSP.2013.6639212

← 1 2 →