Coarse-to-fine speech separation method in the time-frequency domain

被引：2

作者：

Yang, Xue ^{[1
]}

Bao, Changchun ^{[1
]}

Chen, Xianhong ^{[1
]}

机构：

[1] Beijing Univ Technol, Speech & Audio Signal Proc Lab, Fac Informat Technol, Beijing 100124, Peoples R China

来源：

SPEECH COMMUNICATION | 2023年 / 155卷

基金：

中国国家自然科学基金;

关键词：

Speech enhancement; Speech separation; Coarse -to -fine speech separation; Recurrent neural network; Attention mechanism;

D O I：

10.1016/j.specom.2023.103003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Although time-domain speech separation methods have exhibited the outstanding performance in anechoic scenarios, their effectiveness is considerably reduced in the reverberant scenarios. Compared to the time-domain methods, the speech separation methods in time-frequency (T-F) domain mainly concern the structured T-F representations and have shown a great potential recently. In this paper, we propose a coarse-to-fine speech separation method in the T-F domain, which involves two steps: 1) a rough separation conducted in the coarse phase and 2) a precise extraction accomplished in the refining phase. In the coarse phase, the speech signals of all speakers are initially separated in a rough manner, resulting in some level of distortion in the estimated signals. In the refining phase, the T-F representation of each estimated signal acts as a guide to extract the residual T-F representation for the corresponding speaker, which helps to reduce the distortions caused in the coarse phase. Besides, the specially designed networks used for the coarse and refining phases are jointly trained for superior performance. Furthermore, utilizing the recurrent attention with parallel branches (RAPB) block to fully exploit the contextual information contained in the whole T-F features, the proposed model demonstrates competitive performance on clean datasets with a small number of parameters. Additionally, the proposed method shows more robustness and achieves state-of-the-art results on more realistic datasets.

引用

页数：12

共 50 条

[1] Initialization method for speech separation algorithms that work in the time-frequency domain
Sarmiento, Auxiliadora
Duran-Diaz, Ivan
Cruces, Sergio
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (04): : EL121 - EL126
[2] Coarse-to-fine Optimization for Speech Enhancement
Yao, Jian
Al-Dahle, Ahmad
INTERSPEECH 2019, 2019, : 2743 - 2747
[3] TFPSNET: TIME-FREQUENCY DOMAIN PATH SCANNING NETWORK FOR SPEECH SEPARATION
Yang, Lei
Liu, Wei
Wang, Weiqin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6842 - 6846
[4] A Method of Sound Segmentation in Time-Frequency Domain Using Peaks and Valleys in Spectrogram for Speech Separation
Lim, Sung-Kil
Lee, Hyon-Soo
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (08): : 418 - 426
[5] A TIME-FREQUENCY BLIND SEPARATION METHOD FOR UNDERDETERMINED SPEECH MIXTURES
Lv Yao Li Shuangtian(Institute of Acoustics
JournalofElectronics(China), 2008, (05) : 702 - 708
[6] Underdetermined Blind Source Separation of Anechoic Speech Mixtures in the Time-frequency Domain
Lv Yao
Li Shuangtian
ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 22 - 25
[7] A coarse-to-fine unsupervised domain adaptation method based on metric learning
Peng, Yaxin
Yang, Keni
Zhao, Fangrong
Shen, Chaomin
Zhang, Yangchun
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 3013 - 3027
[8] Segmentation on time-frequency domain for speech segregation
Lim, Sung-Kil
Lee, Hyon-Soo
2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 433 - +
[9] Watermarking of speech signals in the time-frequency domain
Al-Khassaweneh, Mahmood
Al-Zoubi, Hussein
Aviyente, Selin
2009 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY, 2009, : 317 - +
[10] Neural speech enhancement in the time-frequency domain
Volkmer, M
2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 617 - 626

← 1 2 3 4 5 →