HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION

被引：0

作者：

Swietojanski, Pawel ^{[1
]}

Ghoshal, Arnab ^{[1
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2013年

基金：

英国工程与自然科学研究理事会;

关键词：

Distant Speech Recognition; Deep Neural Networks; Microphone Arrays; Beamforming; Meeting recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.

引用

页码：285 / 290

页数：6

共 50 条

[1] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
Triefenbach, Fabian
Demuynck, Kris
Martens, Jean-Pierre
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
[2] Acoustic Event Mixing to Multichannel AMI Data for Distant Speech Recognition and Acoustic Event Classification Benchmarking
Astapov, Sergei
Svirskiy, Gleb
Lavrentyev, Aleksandr
Prisyach, Tatyana
Popov, Dmitriy
Ubskiy, Dmitriy
Kabarov, Vladimir
SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 31 - 42
[3] Improved Frequency Modulation Features for Multichannel Distant Speech Recognition
Rodomagoulakis, Isidoros
Maragos, Petros
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 841 - 849
[4] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
Prudnikov, Alexey
Korenevsky, Maxim
Aleinik, Sergei
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
[5] Multi-domain adversarial training of neural network acoustic models for distant speech recognition
Mirsamadi, Seyedmahdad
Hansen, John H. L.
SPEECH COMMUNICATION, 2019, 106 : 21 - 30
[6] LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION
Himawan, Ivan
Motlicek, Petr
Imseng, David
Potard, Blaise
Kim, Namhoon
Lee, Jaewon
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4540 - 4544
[7] Combating Reverberation in Large Vocabulary Continuous Speech Recognition
Mitra, Vikramjit
Van Hout, Julien
McLaren, Mitchell
Wang, Wen
Graciarena, Martin
Vergyri, Dimitra
Franco, Horacio
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2449 - 2453
[8] Distant speech recognition:: Bridging the gaps
McDonough, John
Woelfel, Matthias
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 109 - +
[9] NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Renals, Steve
Swietojanski, Pawel
2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 172 - 176
[10] Open Source German Distant Speech Recognition: Corpus and Acoustic Model
Radeck-Arneth, Stephan
Milde, Benjamin
Lange, Arvid
Gouvea, Evandro
Radomski, Stefan
Muehlhaeuser, Max
Biemann, Chris
TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 480 - 488

← 1 2 3 4 5 →