HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION

被引:0
|
作者
Swietojanski, Pawel [1 ]
Ghoshal, Arnab [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
来源
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2013年
基金
英国工程与自然科学研究理事会;
关键词
Distant Speech Recognition; Deep Neural Networks; Microphone Arrays; Beamforming; Meeting recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.
引用
收藏
页码:285 / 290
页数:6
相关论文
共 50 条
  • [1] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
    Triefenbach, Fabian
    Demuynck, Kris
    Martens, Jean-Pierre
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
  • [2] Acoustic Event Mixing to Multichannel AMI Data for Distant Speech Recognition and Acoustic Event Classification Benchmarking
    Astapov, Sergei
    Svirskiy, Gleb
    Lavrentyev, Aleksandr
    Prisyach, Tatyana
    Popov, Dmitriy
    Ubskiy, Dmitriy
    Kabarov, Vladimir
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 31 - 42
  • [3] Improved Frequency Modulation Features for Multichannel Distant Speech Recognition
    Rodomagoulakis, Isidoros
    Maragos, Petros
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 841 - 849
  • [4] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
    Prudnikov, Alexey
    Korenevsky, Maxim
    Aleinik, Sergei
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
  • [5] Multi-domain adversarial training of neural network acoustic models for distant speech recognition
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2019, 106 : 21 - 30
  • [6] LEARNING FEATURE MAPPING USING DEEP NEURAL NETWORK BOTTLENECK FEATURES FOR DISTANT LARGE VOCABULARY SPEECH RECOGNITION
    Himawan, Ivan
    Motlicek, Petr
    Imseng, David
    Potard, Blaise
    Kim, Namhoon
    Lee, Jaewon
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4540 - 4544
  • [7] Combating Reverberation in Large Vocabulary Continuous Speech Recognition
    Mitra, Vikramjit
    Van Hout, Julien
    McLaren, Mitchell
    Wang, Wen
    Graciarena, Martin
    Vergyri, Dimitra
    Franco, Horacio
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2449 - 2453
  • [8] Distant speech recognition:: Bridging the gaps
    McDonough, John
    Woelfel, Matthias
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 109 - +
  • [9] NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Renals, Steve
    Swietojanski, Pawel
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 172 - 176
  • [10] Open Source German Distant Speech Recognition: Corpus and Acoustic Model
    Radeck-Arneth, Stephan
    Milde, Benjamin
    Lange, Arvid
    Gouvea, Evandro
    Radomski, Stefan
    Muehlhaeuser, Max
    Biemann, Chris
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 480 - 488