Multi-stage temporal representation learning via global and local perspectives for real-time speech enhancement

被引：1

作者：

Chau, Hoang Ngoc ^{[1
]}

Linh, Nguyen Thi Nhat ^{[1
]}

Doan, Tuan Kiet ^{[1
]}

Nguyen, Quoc Cuong ^{[1
]}

机构：

[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi 100000, Vietnam

来源：

APPLIED ACOUSTICS | 2024年 / 223卷

关键词：

Speech enhancement; Deep learning-based; Global and local modeling; Self-attention; Graph convolution; NEURAL-NETWORK; DOMAIN; BEAMFORMER; ATTENTION;

D O I：

10.1016/j.apacoust.2024.110067

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep learning-based speech enhancement algorithms have been rapidly developed over the past few years. Although numerous approaches have been proposed, global and local information from speech features have not been thoroughly investigated. In this paper, we introduce a novel and highly effective speech enhancement network called Multi-stage Global-Local Network (MSGLN), which exploits both local and global information via temporal self-attention, temporal graph convolution, and 1D convolution. Local modeling blocks capture the fast changes in speech signals, while global modeling blocks learn long-term trends in noise or speech signals through factors such as pitch, tone, resonance, timbre, and rhythm. In addition, we propose a multi-stage temporal processing module as the bottleneck of a complex convolutional encoder-decoder structure to guide our network to learn different acoustic structures from different scales. Then a dual-path RNN postprocessing module is integrated to reconstruct the speech spectrum mask using a frequency-wise temporal refinement block followed by a frame-wise spectral refinement block. Experimental results demonstrate the superior performance of our proposed methodology compared to other state-of-the-arts on both real-time single- and multi-channel speech enhancement tasks.

引用

页数：10

共 79 条

[1] Braun Sebastian, 2020, Speech and Computer. 22nd International Conference, SPECOM 2020. Proceedings. Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science (LNAI 12335), P79, DOI 10.1007/978-3-030-60276-5_8
[2] Bu H, 2017, 2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), P58, DOI 10.1109/ICSDA.2017.8384449
[3] A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks
Chau, Hoang Ngoc
Bui, Tien Dat
Nguyen, Huu Binh
Duong, Thanh Thi Hien
Nguyen, Quoc Cuong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1133 - 1144
[4] Speech Enhancement with Fullband-Subband Cross-Attention Network
Chen, Jun
Rao, Wei
Wang, Zilin
Wu, Zhiyong
Wang, Yannan
Yu, Tao
Shang, Shidong
Meng, Helen
[J]. INTERSPEECH 2022, 2022, : 976 - 980
[5] FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT
Chen, Jun
Wang, Zilin
Tuo, Deyi
Wu, Zhiyong
Kang, Shiyin
Meng, Helen
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7857 - 7861
[6] MULTI-STAGE AND MULTI-LOSS TRAINING FOR FULLBAND NON-PERSONALIZED AND PERSONALIZED SPEECH ENHANCEMENT
Chen, Lianwu
Xu, Chenglin
Zhang, Xu
Ren, Xinlei
Zheng, Xiguang
Zhang, Chen
Guo, Liang
Yu, Bing
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9296 - 9300
[7] Decoupling-style monaural speech enhancement with a triple-branch cross-domain fusion network
Chen, Wenzhuo
Yu, Runxiang
Ye, Zhongfu
[J]. APPLIED ACOUSTICS, 2024, 217
[8] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
[9] ICASSP 2022 ACOUSTIC ECHO CANCELLATION CHALLENGE
Cutler, Ross
Saabas, Ando
Parnamaa, Tanel
Purin, Marju
Gamper, Hannes
Braun, Sebastian
Sorensen, Karsten
Aichner, Robert
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9107 - 9111
[10] Fundamentals, present and future perspectives of speech enhancement
Das, Nabanita
Chakraborty, Sayan
Chaki, Jyotismita
Padhy, Neelamadhab
Dey, Nilanjan
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 883 - 901

← 1 2 3 4 5 6 7 8 →