HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION

被引:0
|
作者
Swietojanski, Pawel [1 ]
Ghoshal, Arnab [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
来源
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2013年
基金
英国工程与自然科学研究理事会;
关键词
Distant Speech Recognition; Deep Neural Networks; Microphone Arrays; Beamforming; Meeting recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.
引用
收藏
页码:285 / 290
页数:6
相关论文
共 50 条
  • [41] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [42] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
    Lu, Liang
    Zhang, Xingxing
    Renals, Steve
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064
  • [43] Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
    Guerrero, Cristina
    Tryfou, Georgina
    Omologo, Maurizio
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1986 - 1990
  • [44] Neural Blind Source Separation and Diarization for Distant Speech Recognition
    Bando, Yoshiaki
    Nakamura, Tomohiko
    Watanabe, Shinji
    INTERSPEECH 2024, 2024, : 722 - 726
  • [45] Experimental Study on Dereverberation and Noise Reduction for Distant Speech Recognition
    Fu, Zhong-Hua
    Xie, Lei
    Lv, Hang
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 393 - 397
  • [46] Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
    Li, Bo
    Sainath, Tara N.
    Weiss, Ron J.
    Wilson, Kevin W.
    Bacchiani, Michiel
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1976 - 1980
  • [47] Cepstral distance based channel selection for distant speech recognition
    Flores, Cristina Guerrero
    Tryfou, Georgina
    Omologo, Maurizio
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 314 - 332
  • [48] Distilling Knowledge for Distant Speech Recognition via Parallel Data
    Yi, Jiangyan
    Tao, Jianhua
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 170 - 175
  • [49] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
  • [50] CONTEXT DEPENDENT STATE TYING FOR SPEECH RECOGNITION USING DEEP NEURAL NETWORK ACOUSTIC MODELS
    Bacchiani, Michiel
    Rybach, David
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,