An efficient joint training model for monaural noisy-reverberant speech recognition

被引:0
|
作者
Lian, Xiaoyu [1 ]
Xia, Nan [1 ]
Dai, Gaole [1 ]
Yang, Hongqin [1 ]
机构
[1] Dalian Polytech Univ, Sch Informat Sci & Engn, Dalian 116034, Liaoning, Peoples R China
关键词
Deep learning; Speech enhancement; Speech recognition; Attention mechanism; Joint training; NETWORKS; ENHANCEMENT; FRAMEWORK; SIGNAL;
D O I
10.1016/j.apacoust.2024.110322
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] SNRi Target Training for Joint Speech Enhancement and Recognition
    Koizumi, Yuma
    Karita, Shigeki
    Narayanan, Arun
    Panchapagesan, Sankaran
    Bacchiani, Michiel
    INTERSPEECH 2022, 2022, : 1173 - 1177
  • [22] Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
    Cai, Danwei
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2019, 2019, : 4365 - 4369
  • [23] Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
    Md Jahangir Alam
    Vishwa Gupta
    Patrick Kenny
    Pierre Dumouchel
    EURASIP Journal on Advances in Signal Processing, 2015
  • [24] Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
    Alam, Md Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    Dumouchel, Pierre
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015, : 1 - 13
  • [25] Robust front-end for speech recognition by human and machine in noisy reverberant environments: the effect of phase information
    Liu, Yang
    Nower, Naushin
    Morita, Shota
    Unoki, Masashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [26] A Global Discriminant Joint Training Framework for Robust Speech Recognition
    Li, Lujun
    Kuerzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 544 - 551
  • [27] A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions
    Yang, Zhao
    Ng, Dianwen
    Zhang, Chong
    Jiang, Rui
    Xi, Wei
    Ma, Yukun
    Ni, Chongjia
    Zhao, Jizhong
    Ma, Bin
    Chng, Eng Siong
    INTERSPEECH 2023, 2023, : 4953 - 4957
  • [28] Auditory model for robust speech recognition in real world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 12 - 13
  • [29] Joint Bottleneck Feature and Attention Model for Speech Recognition
    Long Xingyan
    Qu Dan
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2018), 2018, : 46 - 50
  • [30] Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition
    Wiesner, Matthew
    Sarma, Mousmita
    Arora, Ashish
    Raj, Desh
    Gao, Dongji
    Huang, Ruizhe
    Preet, Supreet
    Johnson, Moris
    Iqbal, Zikra
    Goel, Nagendra
    Trmal, Jan
    Garcia, Paola
    Khudanpur, Sanjeev
    INTERSPEECH 2021, 2021, : 2906 - 2910