ROBUST MULTI-CHANNEL SPEECH RECOGNITION USING FREQUENCY ALIGNED NETWORK

被引:0
作者
Park, Taejin [1 ]
Kumatani, Kenichi [2 ]
Wu, Minhua [2 ]
Sundaram, Shiva [2 ]
机构
[1] Univ Southern Calif USC, Los Angeles, CA 90007 USA
[2] Amazon Inc, Sunnyvale, CA USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
multi-channel acoustic modeling; beamforming; microphone arrays; automatic speech recognition;
D O I
10.1109/icassp40776.2020.9053940
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly and improves the ASR performance. We investigate effects of frequency aligned network through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency aligned network shows up to 18% relative reduction in word error rate.
引用
收藏
页码:6859 / 6863
页数:5
相关论文
共 50 条
[21]   Frequency Response Calibration Using Multi-Channel Wiener Filters for Microphone Arrays [J].
Hu, De ;
Chen, Zhe ;
Yin, Fuliang .
IEEE SENSORS JOURNAL, 2019, 19 (17) :7507-7514
[22]   Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming [J].
Bella, Mostafa ;
Saylani, Hicham ;
Hosseini, Shahram ;
Deville, Yannick .
IEEE ACCESS, 2023, 11 :100632-100645
[23]   Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter [J].
Wang, Dujuan ;
Bao, Changchun .
2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
[24]   Multi-Channel Speech Separation with Cross-Attention and Beamforming [J].
Mosner, Ladislav ;
Plchot, Oldrich ;
Peng, Junyi ;
Burget, Lukas ;
Cernocky, Jan Honza .
INTERSPEECH 2023, 2023, :1693-1697
[25]   DETECTING LATERALITY AND NASALITY IN SPEECH WITH THE USE OF A MULTI-CHANNEL RECORDER [J].
Krol, Daniel ;
Lorenc, Anita ;
Swiecinski, Radoslaw .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :5147-5151
[26]   Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement [J].
Kim, Hansol ;
Kang, Kyeongmuk ;
Shin, Jong Won .
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 :1898-1902
[27]   A separation and interaction framework for causal multi-channel speech enhancement [J].
Liu, Wenzhe ;
Li, Andong ;
Zheng, Chengshi ;
Li, Xiaodong .
DIGITAL SIGNAL PROCESSING, 2022, 126
[28]   Multi-channel Speech Enhancement with Multiple-target GANs [J].
Yuan, Jing ;
Bao, Changchun .
2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
[29]   On using Parameterized Multi-channel Non-causal Wiener Filter-Adapted Convolutional Neural Networks for Distant Speech Recognition [J].
Lee, Jeehye ;
Chang, Joon-Hyuk ;
Sohn, Jinho .
2016 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATIONS (ICEIC), 2016,
[30]   Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model [J].
Xue, Cheng ;
Huang, Weilong ;
Chen, Weiguang ;
Feng, Jinwei .
INTERSPEECH 2021, 2021, :1862-1866