EXPLORING WAVLM ON SPEECH ENHANCEMENT

被引:12
作者
Song, Hyungchan [1 ]
Chen, Sanyuan [2 ]
Chen, Zhuo [3 ]
Wu, Yu [2 ]
Yoshioka, Takuya [3 ]
Tang, Min [3 ]
Shin, Jong Won [1 ]
Liu, Shujie [2 ]
机构
[1] Gwanju Inst Sci & Technol, Gwangju, South Korea
[2] Microsoft, Beijing, Peoples R China
[3] Microsoft, New York, NY USA
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
关键词
self-supervised learning; speech enhancement; fine-tuning;
D O I
10.1109/SLT54892.2023.10023356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, We propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high fine-tuning resource condition, only the word error rate is substantially improved.
引用
收藏
页码:451 / 457
页数:7
相关论文
共 32 条
[1]  
Baevski A., 2019, INT C LEARNING REPRE
[2]  
Baevski A, 2020, ADV NEUR IN, V33
[3]   TOWARDS EFFICIENT MODELS FOR REAL-TIME DEEP NOISE SUPPRESSION [J].
Braun, Sebastian ;
Gamper, Hannes ;
Reddy, Chandan K. A. ;
Tashev, Ivan .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :656-660
[4]   A consolidated view of loss functions for supervised deep learning-based speech enhancement [J].
Braun, Sebastian ;
Tashev, Ivan .
2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, :72-76
[5]   UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING [J].
Chen, Sanyuan ;
Wu, Yu ;
Wang, Chengyi ;
Chen, Zhengyang ;
Chen, Zhuo ;
Liu, Shujie ;
Wu, Jian ;
Qian, Yao ;
Wei, Furu ;
Li, Jinyu ;
Yu, Xiangzhan .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6152-6156
[6]   CONTINUOUS SPEECH SEPARATION WITH CONFORMER [J].
Chen, Sanyuan ;
Wu, Yu ;
Chen, Zhuo ;
Wu, Jian ;
Li, Jinyu ;
Yoshioka, Takuya ;
Wang, Chengyi ;
Liu, Shujie ;
Zhou, Ming .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :5749-5753
[7]  
Chen Sanyuan, 2022, IEEE J-STSP
[8]  
Chi ZW, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P6170
[9]   W2V-BERT: COMBINING CONTRASTIVE LEARNING AND MASKED LANGUAGE MODELING FOR SELF-SUPERVISED SPEECH PRE-TRAINING [J].
Chung, Yu-An ;
Zhang, Yu ;
Han, Wei ;
Chiu, Chung-Cheng ;
Qin, James ;
Pang, Ruoming ;
Wu, Yonghui .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :244-250
[10]   Real Time Speech Enhancement in the Waveform Domain [J].
Defossez, Alexandre ;
Synnaeve, Gabriel ;
Adi, Yossi .
INTERSPEECH 2020, 2020, :3291-3295