EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION

被引:0
作者
Boeddeker, Christoph [1 ,2 ]
Erdogan, Hakan [1 ]
Yoshioka, Takuya [1 ]
Haeb-Umbach, Reinhold [2 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
[2] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Far-field speech recognition; acoustic beamforming; neural networks; time-frequency masks; online processing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work examines acoustic beamformers employing neural networks (NNs) for mask prediction as front-end for automatic speech recognition (ASR) systems for practical scenarios like voice-enabled home devices. To test the versatility of the mask predicting network, the system is evaluated with different recording hardware, different microphone array designs, and different acoustic models of the downstream ASR system. Significant gains in recognition accuracy are obtained in all configurations despite the fact that the NN had been trained on mismatched data. Unlike previous work, the NN is trained on a feature level objective, which gives some performance advantage over a mask related criterion. Furthermore, different approaches for realizing online, or adaptive, NN-based beamforming are explored, where the online algorithms still show significant gains compared to the baseline performance.
引用
收藏
页码:6697 / 6701
页数:5
相关论文
共 23 条
  • [11] Hori T., 2017, COMPUTER SPEECH LANG
  • [12] Kinoshita K, 2013, IEEE WORK APPL SIG
  • [13] Kitza M., 2016, 12 ITG S SPEECH COMM, P1
  • [14] Large-Scale Domain Adaptation via Teacher-Student Learning
    Li, Jinyu
    Seltzer, Michael L.
    Wang, Xi
    Zhao, Rui
    Gong, Yifan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2386 - 2390
  • [15] Menne T., 2016, P CHIME WORKSH
  • [16] Sainath TN, 2016, INT CONF ACOUST SPEE, P5075, DOI 10.1109/ICASSP.2016.7472644
  • [17] Schmalenstroeer J, 2017, IEEE INT WORKSH MULT
  • [18] On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction
    Souden, Mehrez
    Benesty, Jacob
    Affes, Sofiene
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (02): : 260 - 276
  • [19] An analysis of environment, microphone and data simulation mismatches in robust speech recognition
    Vincent, Emmanuel
    Watanabe, Shinji
    Nugraha, Aditya Arie
    Barker, Jon
    Marxer, Ricard
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 535 - 557
  • [20] Blind acoustic beamforming based on generalized eigenvalue decomposition
    Warsitz, Ernst
    Haeb-Umbach, Reinhold
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05): : 1529 - 1539