Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition

被引：6

作者：

Gao, Tian ^{[1
]}

Du, Jun ^{[1
]}

Xu, Yong ^{[1
]}

Liu, Cong ^{[2
]}

Dai, Li-Rong ^{[1
]}

Lee, Chin-Hui ^{[3
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, JinZhai Rd, Hefei, Peoples R China

[2] IFlytek Co Ltd, IFlytek Res, Hefei, Peoples R China

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING | 2016年

基金：

中国国家自然科学基金;

关键词：

Distant speech recognition; Dereverberation; Joint training; Deep neural network; Beamforming; FRONT-END; SEPARATION;

D O I：

10.1186/s13634-016-0384-5

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We explore joint training strategies of DNNs for simultaneous dereverberation and acoustic modeling to improve the performance of distant speech recognition. There are two key contributions. First, a new DNN structure incorporating both dereverberated and original reverberant features is shown to effectively improve recognition accuracy over the conventional one using only dereverberated features as the input. Second, in most of the simulated reverberant environments for training data collection and DNN-based dereverberation, the resource data and learning targets are high-quality clean speech. With our joint training strategy, we can relax this constraint by using large-scale diversified real close-talking data as the targets which are easy to be collected via many speech-enabled applications from mobile internet users, and find the scenario even more effective. Our experiments on a Mandarin speech recognition task with 2000-h training data show that the proposed framework achieves relative word error rate reductions of 9.7 and 8.6 % over the multi-condition training systems for the cases of single-channel and multi-channel with beamforming, respectively. Furthermore, significant gains are consistently observed over the pre-processing approach using simply DNN-based dereverberation.

引用

页数：13

共 50 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS
ALLEN, JB
BERKLEY, DA
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) : 943 - 950
[2] [Anonymous], 2009, Distant Speech Recognition
[3] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
[4] Brandstein M., 2001, Microphone Arrays: Signal Processing Techniques and Applications
[5] Couvreur L., 2000, 6 INT C SPOKEN LANGU
[6] Delcroix M., 2014, Proceedings of REVERB Challenge Workshop
[7] Du J, 2015, AUT SPEECH REC UND A, P430
[8] Du J, 2014, INTERSPEECH, P616
[9] Gao T, 2015, INT CONF ACOUST SPEE, P4375, DOI 10.1109/ICASSP.2015.7178797
[10] Ghahremani Pegah, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P2494, DOI 10.1109/ICASSP.2014.6854049

← 1 2 3 4 5 →