An efficient joint training model for monaural noisy-reverberant speech recognition

被引:0
|
作者
Lian, Xiaoyu [1 ]
Xia, Nan [1 ]
Dai, Gaole [1 ]
Yang, Hongqin [1 ]
机构
[1] Dalian Polytech Univ, Sch Informat Sci & Engn, Dalian 116034, Liaoning, Peoples R China
关键词
Deep learning; Speech enhancement; Speech recognition; Attention mechanism; Joint training; NETWORKS; ENHANCEMENT; FRAMEWORK; SIGNAL;
D O I
10.1016/j.apacoust.2024.110322
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
  • [2] Dual branch deep interactive UNet for monaural noisy-reverberant speech enhancement
    Zhang, Zehua
    Xu, Shiyun
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    APPLIED ACOUSTICS, 2023, 212
  • [3] Noisy-reverberant Speech Enhancement Using DenseUNet with Time-frequency Attention
    Zhao, Yan
    Wang, DeLiang
    INTERSPEECH 2020, 2020, : 3261 - 3265
  • [4] Two-Stage Enhancement of Noisy and Reverberant Microphone Array Speech for Automatic Speech Recognition Systems Trained with Only Clean Speech
    Wang, Quandong
    Wang, Sicheng
    Ge, Fengpei
    Han, Chang Woo
    Lee, Jaewon
    Guo, Lianghao
    Lee, Chin-Hui
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 21 - 25
  • [5] Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition
    Tang, Zhiyuan
    Li, Lantian
    Wang, Dong
    Vipperla, Ravichander
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) : 493 - 504
  • [6] ROBUST RECOGNITION OF REVERBERANT AND NOISY SPEECH USING COHERENCE-BASED PROCESSING
    Menon, Anjali
    Kim, Chanwoo
    Stern, Richard M.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6775 - 6779
  • [7] SPATIAL DIFFUSENESS FEATURES FOR DNN-BASED SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Schwarz, Andreas
    Huemmer, Christian
    Maas, Roland
    Kellermann, Walter
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4380 - 4384
  • [8] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [9] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [10] Improved Noisy Student Training for Automatic Speech Recognition
    Park, Daniel S.
    Zhang, Yu
    Jia, Ye
    Han, Wei
    Chiu, Chung-Cheng
    Li, Bo
    Wu, Yonghui
    Le, Quoc, V
    INTERSPEECH 2020, 2020, : 2817 - 2821