Subband-based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches

被引:0
作者
Shi, Hao [1 ]
Wang, Longbiao [2 ]
Li, Sheng [3 ]
Dang, Jianwu [2 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan
[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[3] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
来源
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年
关键词
Speech enhancement; deep learning; spectrogram fusion; subband fusion; NEURAL-NETWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning brings effective optimization and significant improvements to speech enhancement (SE). Mapping and masking are currently major approaches in single-channel frequency-domain SE with supervised learning. In this work, we first show that these two approaches are complementary in that mapping is more effective in low-frequency bands, while masking is more suitable in high-frequency bands. This is because the high-frequency bands typically have low energy, so estimating the enhanced spectrogram directly does not make sense. Moreover, learning on the low-energy parts is often annihilated by learning on the high-energy parts during the entire loss calculation. To exploit this complementarity, we propose subband-based spectrogram fusion (SBSF), which combines the spectrogram of low-frequency and high-frequency estimated by different SE models. Experimental evaluations show that the SBSF significantly improved the SE performance.
引用
收藏
页码:286 / 292
页数:7
相关论文
共 38 条
[1]  
Choi H.-S., 2018, INT C LEARNING REPRE
[2]  
Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
[3]   SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement [J].
Fu, Szu-Wei ;
Tsao, Yu ;
Lu, Xugang .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3768-3772
[4]   SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement [J].
Gao, Tian ;
Du, Jun ;
Dai, Li-Rong ;
Lee, Chin-Hui .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3713-3717
[5]   Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement [J].
Ge, Meng ;
Wang, Longbiao ;
Li, Nan ;
Shi, Hao ;
Dang, Jianwu ;
Li, Xiangang .
INTERSPEECH 2019, 2019, :3153-3157
[6]  
Graves A, 2012, STUD COMPUT INTELL, V385, P37
[7]  
Handa M, 2001, INT CONF ACOUST SPEE, P2761, DOI 10.1109/ICASSP.2001.940218
[8]   FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT [J].
Hao, Xiang ;
Su, Xiangdong ;
Horaud, Radu ;
Li, Xiaofei .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6633-6637
[9]   Evaluation of objective quality measures for speech enhancement [J].
Hu, Yi ;
Loizou, Philipos C. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :229-238
[10]   A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research [J].
Kinoshita, Keisuke ;
Delcroix, Marc ;
Gannot, Sharon ;
Habets, Emanuel A. P. ;
Haeb-Umbach, Reinhold ;
Kellermann, Walter ;
Leutnant, Volker ;
Maas, Roland ;
Nakatani, Tomohiro ;
Raj, Bhiksha ;
Sehr, Armin ;
Yoshioka, Takuya .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, :1-19