HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION

被引:0
|
作者
Swietojanski, Pawel [1 ]
Ghoshal, Arnab [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
来源
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2013年
基金
英国工程与自然科学研究理事会;
关键词
Distant Speech Recognition; Deep Neural Networks; Microphone Arrays; Beamforming; Meeting recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.
引用
收藏
页码:285 / 290
页数:6
相关论文
共 50 条
  • [31] A COMPARISON BETWEEN DEEP NEURAL NETS AND KERNEL ACOUSTIC MODELS FOR SPEECH RECOGNITION
    Lu, Zhiyun
    Guo, Dong
    Garakani, Alireza Bagheri
    Liu, Kuan
    May, Avner
    Bellet, Aurelien
    Fan, Linxi
    Collins, Michael
    Kingsbury, Brian
    Picheny, Michael
    Sha, Fei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5070 - 5074
  • [32] Distant Speech Recognition Using a Microphone Array Network
    Nakano, Alberto Yoshihiro
    Nakagawa, Seiichi
    Yamamoto, Kazumasa
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2451 - 2462
  • [33] Automatic context window composition for distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    SPEECH COMMUNICATION, 2018, 101 : 34 - 44
  • [34] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Ravanelli, Mirco
    Brakel, Philemon
    Omologo, Maurizio
    Bengio, Yoshua
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
  • [35] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
    Gao, Tian
    Du, Jun
    Xu, Yong
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
  • [36] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
    Kanda, Naoyuki
    Fujita, Yusuke
    Horiguchi, Shota
    Ikeshita, Rintaro
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634
  • [37] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
    Tian Gao
    Jun Du
    Yong Xu
    Cong Liu
    Li-Rong Dai
    Chin-Hui Lee
    EURASIP Journal on Advances in Signal Processing, 2016
  • [38] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
    Misbullah, Alim
    Lin, Hai-Hsing
    Chang, Chia-Yuan
    Yeh, Hsiu-Wei
    Weng, Ko-Cheng
    2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
  • [39] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
    Kosaka, Tetsuo
    Saeki, Kazuya
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
  • [40] MULTICHANNEL AUDIO FRONT-END FOR FAR-FIELD AUTOMATIC SPEECH RECOGNITION
    Chhetri, Amit
    Hilmes, Philip
    Kristjansson, Trausti
    Chu, Wai
    Mansour, Mohamed
    Li, Xiaoxue
    Zhang, Xianxian
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1527 - 1531