Subband-based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches

被引：0

作者：

Shi, Hao ^{[1
]}

Wang, Longbiao ^{[2
]}

Li, Sheng ^{[3
]}

Dang, Jianwu ^{[2
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto, Japan

[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[3] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

Speech enhancement; deep learning; spectrogram fusion; subband fusion; NEURAL-NETWORK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning brings effective optimization and significant improvements to speech enhancement (SE). Mapping and masking are currently major approaches in single-channel frequency-domain SE with supervised learning. In this work, we first show that these two approaches are complementary in that mapping is more effective in low-frequency bands, while masking is more suitable in high-frequency bands. This is because the high-frequency bands typically have low energy, so estimating the enhanced spectrogram directly does not make sense. Moreover, learning on the low-energy parts is often annihilated by learning on the high-energy parts during the entire loss calculation. To exploit this complementarity, we propose subband-based spectrogram fusion (SBSF), which combines the spectrogram of low-frequency and high-frequency estimated by different SE models. Experimental evaluations show that the SBSF significantly improved the SE performance.

引用

页码：286 / 292

页数：7

共 38 条

[1]

Choi H.-S., 2018, INT C LEARNING REPRE

[2]

Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061

[3] SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement [J].

Fu, Szu-Wei ;

Tsao, Yu ;

Lu, Xugang .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3768-3772

[4] SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement [J].

Gao, Tian ;

Du, Jun ;

Dai, Li-Rong ;

Lee, Chin-Hui .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3713-3717

[5] Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement [J].

Ge, Meng ;

Wang, Longbiao ;

Li, Nan ;

Shi, Hao ;

Dang, Jianwu ;

Li, Xiangang .

INTERSPEECH 2019, 2019, :3153-3157

[6]

Graves A, 2012, STUD COMPUT INTELL, V385, P37

[7]

Handa M, 2001, INT CONF ACOUST SPEE, P2761, DOI 10.1109/ICASSP.2001.940218

[8] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT [J].

Hao, Xiang ;

Su, Xiangdong ;

Horaud, Radu ;

Li, Xiaofei .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6633-6637

[9] Evaluation of objective quality measures for speech enhancement [J].

Hu, Yi ;

Loizou, Philipos C. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :229-238

[10] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research [J].

Kinoshita, Keisuke ;

Delcroix, Marc ;

Gannot, Sharon ;

Habets, Emanuel A. P. ;

Haeb-Umbach, Reinhold ;

Kellermann, Walter ;

Leutnant, Volker ;

Maas, Roland ;

Nakatani, Tomohiro ;

Raj, Bhiksha ;

Sehr, Armin ;

Yoshioka, Takuya .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, :1-19

← 1 2 3 4 →