Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition

被引：4

作者：

Shi, Hao ^{[1
]}

Mimura, Masato ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Speech enhancement; robust automatic speech recognition (ASR); time-frequency hybrid model; spectral information refining; FRAMEWORK;

D O I：

10.1109/TASLP.2024.3407511

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM.

引用

页码：3049 / 3060

页数：12

共 50 条

[31] Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures
Moore, A. H.
Parada, P. Peso
Naylor, P. A.
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 574 - 584
[32] Magnitude Spectrum Enhancement for Robust Speech Recognition
Tu, Wen-hsiang
Hung, Jeih-weih
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4586 - 4589
[33] Semantic Enhancement Framework for Robust Speech Recognition
Yang, Baochen
Yu, Kai
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 81 - 88
[34] On the Use of Spectrogram Inversion for Speech Enhancement
Bedoui, Raja Abdelmalek
Mnasri, Zied
Benzarti, Faouzi
2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 852 - 857
[35] Nonlinear Enhancement of Onset for Robust Speech Recognition
Kim, Chanwoo
Stern, Richard M.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2058 - +
[36] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
INGEMANN, F
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 : S27 - S27
[37] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
INGEMANN, F
MERMELSTEIN, P
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 57 (01): : 253 - 255
[38] Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN
Enriquez, Marc Dominic
Lucas, Crisron Rudolf
Aquino, Angelina
2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC, 2023,
[39] EXEMPLAR-BASED NOISE ROBUST AUTOMATIC SPEECH RECOGNITION USING MODULATION SPECTROGRAM FEATURES
Baby, Deepak
Virtanen, Tuomas
Gemmeke, Jort F.
Barker, Tom
Van Hamme, Hugo
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 519 - 524
[40] Combined Waveform-Cepstral Representation for Robust Speech Recognition
Ager, Matthew
Cvetkovic, Zoran
Sollich, Peter
2011 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2011, : 864 - 868

← 1 2 3 4 5 →