Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition

被引:4
|
作者
Shi, Hao [1 ]
Mimura, Masato [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
Speech enhancement; robust automatic speech recognition (ASR); time-frequency hybrid model; spectral information refining; FRAMEWORK;
D O I
10.1109/TASLP.2024.3407511
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM.
引用
收藏
页码:3049 / 3060
页数:12
相关论文
共 50 条
  • [31] Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures
    Moore, A. H.
    Parada, P. Peso
    Naylor, P. A.
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 574 - 584
  • [32] Magnitude Spectrum Enhancement for Robust Speech Recognition
    Tu, Wen-hsiang
    Hung, Jeih-weih
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4586 - 4589
  • [33] Semantic Enhancement Framework for Robust Speech Recognition
    Yang, Baochen
    Yu, Kai
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 81 - 88
  • [34] On the Use of Spectrogram Inversion for Speech Enhancement
    Bedoui, Raja Abdelmalek
    Mnasri, Zied
    Benzarti, Faouzi
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 852 - 857
  • [35] Nonlinear Enhancement of Onset for Robust Speech Recognition
    Kim, Chanwoo
    Stern, Richard M.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2058 - +
  • [36] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
    INGEMANN, F
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 : S27 - S27
  • [37] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
    INGEMANN, F
    MERMELSTEIN, P
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 (01): : 253 - 255
  • [38] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
    Enriquez, Marc Dominic
    Lucas, Crisron Rudolf
    Aquino, Angelina
    2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
  • [39] EXEMPLAR-BASED NOISE ROBUST AUTOMATIC SPEECH RECOGNITION USING MODULATION SPECTROGRAM FEATURES
    Baby, Deepak
    Virtanen, Tuomas
    Gemmeke, Jort F.
    Barker, Tom
    Van Hamme, Hugo
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 519 - 524
  • [40] Combined Waveform-Cepstral Representation for Robust Speech Recognition
    Ager, Matthew
    Cvetkovic, Zoran
    Sollich, Peter
    2011 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2011, : 864 - 868