An efficient joint training model for monaural noisy-reverberant speech recognition

被引:0
作者
Lian, Xiaoyu [1 ]
Xia, Nan [1 ]
Dai, Gaole [1 ]
Yang, Hongqin [1 ]
机构
[1] Dalian Polytech Univ, Sch Informat Sci & Engn, Dalian 116034, Liaoning, Peoples R China
关键词
Deep learning; Speech enhancement; Speech recognition; Attention mechanism; Joint training; NETWORKS; ENHANCEMENT; FRAMEWORK; SIGNAL;
D O I
10.1016/j.apacoust.2024.110322
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model
    Liu, Jindong
    Yang, Guang-Zhong
    SPEECH COMMUNICATION, 2015, 67 : 65 - 77
  • [32] NOISE MODEL TRANSFER USING AFFINE TRANSFORMATION WITH APPLICATION TO LARGE VOCABULARY REVERBERANT SPEECH RECOGNITION
    Yoshioka, Takuya
    Nakatani, Tomohiro
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7058 - 7062
  • [33] Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement
    Björn Schuller
    Martin Wöllmer
    Tobias Moosmayr
    Gerhard Rigoll
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [34] Noisy Speech Training in MFCC-based Speech Recognition with Noise Suppression Toward Robot Assisted Autism therapy
    Attawibulkul, Sujirat
    Kaewkamnerdpong, Boonserm
    Miyanaga, Yoshikazu
    2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
  • [35] Adaptive Training with Noisy Constrained Maximum Likelihood Linear Regression for Noise Robust Speech Recognition
    Kim, D. K.
    Gales, M. J. F.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2367 - 2370
  • [36] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
    Prudnikov, Alexey
    Korenevsky, Maxim
    Aleinik, Sergei
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
  • [37] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
    Gao, Tian
    Du, Jun
    Xu, Yong
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
  • [38] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
    Tian Gao
    Jun Du
    Yong Xu
    Cong Liu
    Li-Rong Dai
    Chin-Hui Lee
    EURASIP Journal on Advances in Signal Processing, 2016
  • [39] Acoustic model training for speech recognition over mobile networks
    Vojtko, Juraj
    Kacur, Juraj
    Rozinaj, Gregor
    Korosi, Jan
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2013, 6 (02) : 65 - 74
  • [40] Hybrid model of hidden Markov models and wavelet neural network in noisy speech recognition
    Lin Sui-fang
    Pan Yong-xiang
    Sun Xu-xia
    Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 675 - 678