A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition

被引:0
|
作者
Xue, Shaofei [1 ]
Yan, Zhijie [1 ]
Yu, Tao [2 ]
Liu, Zhang [3 ]
机构
[1] Alibaba Inc, Beijing, Peoples R China
[2] Alibaba Grp US Inc, Seattle, WA USA
[3] Alibaba Inc, Hangzhou, Zhejiang, Peoples R China
来源
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP) | 2018年
关键词
far-field speech recognition; deep neural network; simulated data; mandarin chinese; DEEP NEURAL-NETWORKS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Far-field speech recognition is an essential technique for man-machine interactions. It aims to enable smart devices to recognize distant human speech. This technology is applied to many scenarios such as smart home appliances (smart loudspeaker, smart TV) and meeting transcription. Despite the significant advancement made in robust and far-field speech recognition after the introduction of deep neural network based acoustic models, the far-field speech recognition remains a challenging task due to various factors such as background noise, reverberation and even human voice interference. In this paper, we describe several technical advances for improving the performance of large-scale far-field speech recognition, including simulated data generation, improvements on front-end modules and neural network based acoustic models. Experimental results on several Mandarin Chinese speech recognition tasks have demonstrated that the combination of these technical advances can significantly outperform the conventional models.
引用
收藏
页数:5
相关论文
共 44 条
  • [1] A Study on Deep Neural Network Acoustic Model Adaptation for Robust Far-field Speech Recognition
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2430 - 2434
  • [2] AN INVESTIGATION INTO USING PARALLEL DATA FOR FAR-FIELD SPEECH RECOGNITION
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5725 - 5729
  • [3] Far-Field Automatic Speech Recognition
    Haeb-Umbach, Reinhold
    Heymann, Jahn
    Drude, Lukas
    Watanabe, Shinji
    Delcroix, Marc
    Nakatani, Tomohiro
    PROCEEDINGS OF THE IEEE, 2021, 109 (02) : 124 - 148
  • [4] Far-Field Speech Recognition Using Multivariate Autoregressive Models
    Ganapathy, Sriram
    Harish, Madhumita
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3023 - 3027
  • [5] FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME
    Yoshioka, Takuya
    Karita, Shigeki
    Nakatani, Tomohiro
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4360 - 4364
  • [6] Feature mapping using far-field microphones for distant speech recognition
    Himawan, Ivan
    Motlicek, Petr
    Imseng, David
    Sridharan, Sridha
    SPEECH COMMUNICATION, 2016, 83 : 1 - 9
  • [7] INTEGRATED ADAPTATION WITH MULTI-FACTOR JOINT-LEARNING FOR FAR-FIELD SPEECH RECOGNITION
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    Zhang, Yu
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5770 - 5774
  • [8] MULTICHANNEL AUDIO FRONT-END FOR FAR-FIELD AUTOMATIC SPEECH RECOGNITION
    Chhetri, Amit
    Hilmes, Philip
    Kristjansson, Trausti
    Chu, Wai
    Mansour, Mohamed
    Li, Xiaoxue
    Zhang, Xianxian
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1527 - 1531
  • [9] 3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION
    Ganapathy, Sriram
    Peddinti, Vijayaditya
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5499 - 5503
  • [10] EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION
    Boeddeker, Christoph
    Erdogan, Hakan
    Yoshioka, Takuya
    Haeb-Umbach, Reinhold
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6697 - 6701