ROBUST MULTI-CHANNEL SPEECH RECOGNITION USING FREQUENCY ALIGNED NETWORK

被引:0
作者
Park, Taejin [1 ]
Kumatani, Kenichi [2 ]
Wu, Minhua [2 ]
Sundaram, Shiva [2 ]
机构
[1] Univ Southern Calif USC, Los Angeles, CA 90007 USA
[2] Amazon Inc, Sunnyvale, CA USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
multi-channel acoustic modeling; beamforming; microphone arrays; automatic speech recognition;
D O I
10.1109/icassp40776.2020.9053940
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly and improves the ASR performance. We investigate effects of frequency aligned network through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency aligned network shows up to 18% relative reduction in word error rate.
引用
收藏
页码:6859 / 6863
页数:5
相关论文
共 50 条
[41]   CLOSING THE GAP BETWEEN TIME-DOMAIN MULTI-CHANNEL SPEECH ENHANCEMENT ON REAL AND SIMULATION CONDITIONS [J].
Zhang, Wangyou ;
Shi, Jing ;
Li, Chenda ;
Watanabe, Shinji ;
Qian, Yanmin .
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, :146-150
[42]   ROBUST MULTI CHANNEL TDOA ESTIMATION FOR SPEAKER LOCALIZATION USING THE IMPULSIVE CHARACTERISTICS OF SPEECH SPECTRUM [J].
He, Hongsen ;
Chen, Jingdong ;
Benesty, Jacob ;
Zhou, Yingyue ;
Yang, Tao .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :6130-6134
[43]   ROBUST SPEECH RECOGNITION USING GENERATIVE ADVERSARIAL NETWORKS [J].
Sriram, Anuroop ;
Jun, Heewoo ;
Gaur, Yashesh ;
Satheesh, Sanjeev .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :5639-5643
[44]   DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation [J].
Wang, Zhenyu ;
Zhou, Yi ;
Gan, Lu ;
Chen, Rilin ;
Tang, Xinyu ;
Liu, Hongqing .
2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, :180-184
[45]   Channel selection measures for multi-microphone speech recognition [J].
Wolf, Martin ;
Nadeu, Climent .
SPEECH COMMUNICATION, 2014, 57 :170-180
[46]   Multi-channel nonlinear phase analysis for time frequency data fusion [J].
Mavandadi, S ;
Aarabi, P .
MULTISENSOR, MULTISOURCE INFORMATION FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS 2003, 2003, 5099 :222-231
[47]   Multi-Channel Speech Enhancement Using Labelled Random Finite Sets and a Neural Beamformer in Cocktail Party Scenario [J].
Datta, Jayanta ;
Firoozabadi, Ali Dehghan ;
Zabala-Blanco, David ;
Castillo-Soria, Francisco R. .
APPLIED SCIENCES-BASEL, 2025, 15 (06)
[48]   Dual-channel VTS feature compensation for noise-robust speech recognition on mobile devices [J].
Lopez-Espejo, Ivan ;
Peinado, Antonio M. ;
Gomez, Angel M. ;
Gonzalez, Jose A. .
IET SIGNAL PROCESSING, 2017, 11 (01) :17-25
[49]   Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments [J].
Taherian, Hassan ;
Wang, Zhong-Qiu ;
Wane, DeLiang .
INTERSPEECH 2019, 2019, :4070-4074
[50]   END-TO-END MICROPHONE PERMUTATION AND NUMBER INVARIANT MULTI-CHANNEL SPEECH SEPARATION [J].
Luo, Yi ;
Chen, Zhuo ;
Mesgarani, Nima ;
Yoshioka, Takuya .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :6394-6398