Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition

被引:5
|
作者
Fan, Cunhang [1 ]
Ding, Mingming [1 ]
Yi, Jiangyan [2 ]
Li, Jinpeng [3 ]
Lv, Zhao [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei, Peoples R China
[2] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Ningbo Inst Life & Hlth Ind, Ningbo, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust end-to-end ASR; Speech enhancement; Masking and mapping; Speech distortion; Deep spectrum fusion; ENHANCEMENT; NETWORKS; DEREVERBERATION;
D O I
10.1016/j.apacoust.2023.109547
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, speech enhancement (SE) methods have achieved quite good performances. However, because of the speech distortion problem, the enhanced speech may lose significant information, which degrades the performance of automatic speech recognition (ASR). To address this problem, this paper proposes a two-stage deep spectrum fusion with the joint training framework for noise-robust end-to-end (E2E) ASR. It consists of a masking and mapping fusion (MMF) and a gated recurrent fusion (GRF). The MMF is used as the first stage and focuses on SE, which explores the complementarity of the enhancement methods of masking-based and mapping based to alleviate the problem of speech distortion. The GRF is used as the second stage and aims to further retrieve the lost information by fusing the enhanced speech of MMF and the original input. We conduct extensive experiments on an open Mandarin speech corpus AISHELL-1 with two noise datasets named 100 Nonspeech and NOISEX-92. Experimental results indicate that our proposed method significantly improves the performance and the character error rate (CER) is relatively reduced by 17.36% compared with the conventional joint training method.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION
    Hu, Yuchen
    Hou, Nana
    Chen, Chen
    Chng, Eng Siong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6292 - 6296
  • [2] Noise-robust Attention Learning for End-to-End Speech Recognition
    Higuchi, Yosuke
    Tawara, Naohiro
    Ogawa, Atsunori
    Iwata, Tomoharu
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
  • [3] Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
    Hu, Yuchen
    Hou, Nana
    Chen, Chen
    Chng, Eng Siong
    INTERSPEECH 2023, 2023, : 2918 - 2922
  • [4] Noise Robust End-to-End Speech Recognition For Bangla Language
    Sumit, Sakhawat Hosain
    Al Muntasir, Tareq
    Zaman, M. M. Arefin
    Nandi, Rabindra Nath
    Sourov, Tanvir
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [5] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [6] Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy Networks
    Yao, Jiahao
    Kottering, Paul
    Gundlach, Hans
    Lin, Lin
    Bukov, Marin
    MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 1044 - 1081
  • [7] A PRACTICAL TWO-STAGE TRAINING STRATEGY FOR MULTI-STREAM END-TO-END SPEECH RECOGNITION
    Li, Ruizhi
    Sell, Gregory
    Wang, Xiaofei
    Watanabe, Shinji
    Hermansky, Hynek
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7014 - 7018
  • [8] Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition
    Shi, Hao
    Wang, Longbiao
    Li, Sheng
    Fang, Cunhang
    Dang, Jianwu
    Kawahara, Tatsuya
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 438 - 442
  • [9] End-to-End Deep Learning for Phase Noise-Robust Multi-Dimensional Geometric Shaping
    Talreja, Veeru
    Koike-Akino, Toshiaki
    Wang, Ye
    Millar, David S.
    Kojima, Keisuke
    Parsons, Kieran
    2020 EUROPEAN CONFERENCE ON OPTICAL COMMUNICATIONS (ECOC), 2020,
  • [10] AN INVESTIGATION OF END-TO-END MODELS FOR ROBUST SPEECH RECOGNITION
    Prasad, Archiki
    Jyothi, Preethi
    Velmurugan, Rajbabu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6893 - 6897