PHASE CONTINUITY: LEARNING DERIVATIVES OF PHASE SPECTRUM FOR SPEECH ENHANCEMENT

被引:4
作者
Kim, Doyeon [1 ]
Han, Hyewon [1 ]
Shin, Hyeon-Kyeong [1 ,2 ]
Chung, Soo-Whan [2 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
[2] Naver Corp, Seongnam, South Korea
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
speech enhancement; denoising; phase reconstruction; phase continuity loss;
D O I
10.1109/ICASSP43922.2022.9746087
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective phase reconstruction strategy for neural speech enhancement that can operate in noisy environments. Specifically, we introduce a phase continuity loss that considers relative phase variations across the time and frequency axes. By including this phase continuity loss in a state-of-the-art neural speech enhancement system trained with reconstruction loss and a number of magnitude spectral losses, we show that our proposed method further improves the quality of enhanced speech signals over the baseline, especially when training is done jointly with a magnitude spectrum loss.
引用
收藏
页码:6942 / 6946
页数:5
相关论文
共 25 条
[1]   Further intelligibility results from human listening tests using the short-time phase spectrum [J].
Alsteris, Leigh D. ;
Paliwal, Kuldip K. .
SPEECH COMMUNICATION, 2006, 48 (06) :727-736
[2]  
[Anonymous], 2014, INT CONF ACOUST SPEE
[3]  
[Anonymous], 2019, ICML
[4]   Real Time Speech Enhancement in the Waveform Domain [J].
Defossez, Alexandre ;
Synnaeve, Gabriel ;
Adi, Yossi .
INTERSPEECH 2020, 2020, :3291-3295
[5]  
Erdogan Hakan, 2015, ICASSP
[6]  
Gaich A., 2015, INTERSPEECH
[7]  
Gaich A., 2015, ICASSP
[8]   Evaluation of objective quality measures for speech enhancement [J].
Hu, Yi ;
Loizou, Philipos C. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :229-238
[9]   A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems [J].
Lee, Jinkyu ;
Kang, Hong-Goo .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (06) :1098-1109
[10]  
Lindgren A. C., 2003, ICASSP