A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments

被引：19

作者：

Gao, Tian ^{[1
]}

Du, Jun ^{[1
]}

Dai, Li-Rong ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

SPEECH COMMUNICATION | 2017年 / 95卷

基金：

中国国家自然科学基金;

关键词：

Speaker-dependent speech processing; Speech enhancement; Speech separation; Deep neural network; Low SNR; NEURAL-NETWORKS; DEEP; ALGORITHM; NOISE;

D O I：

10.1016/j.specom.2017.10.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a unified speech enhancement framework to jointly handle both background noise and interfering speech in a speaker-dependent scenario based on deep neural networks (DNNs). We first explore speaker-dependent speech enhancement that can significantly improve system performance over speaker-independent systems. Next, we consider interfering speech as one noise type, thus a speaker-dependent DNN system can be adopted for both speech enhancement and separation. Experimental results demonstrate that the proposed unified system can achieve comparable performances to specific systems where only noise or speech interference is present. Furthermore, much better results can be obtained over individual enhancement or separation systems in mixed background noise and interfering speech scenarios. The training data for the two specific tasks are also found to be complementary. Finally, an ensemble learning-based framework is employed to further improve the system performance in low signal-to-noise ratio (SNR) environments. A voice activity detection (VAD) DNN and an ideal ratio mask (IRM) DNN are investigated to provide prior information to integrate two sub-modules at frame level and time-frequency level, respectively. The results demonstrate the effectiveness of the ensemble method in low SNR environments.

引用

页码：28 / 39

页数：12

共 50 条

[41] A Survey on Low-Latency DNN-Based Speech Enhancement
Drgas, Szymon
SENSORS, 2023, 23 (03)
[42] Simultaneous Speech Detection and Magnitude Squared Spectrum Estimation Approach for Speech Enhancement
Han, Ruirui
Ou, Shifeng
Liu, Wei
Chen, Chen
Zhang, Shuo
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 281 - 285
[43] Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain
Lee, Yun-Kyung
Park, Jeon Gue
Lee, Yun Keun
Kwon, Oh-Wook
ETRI JOURNAL, 2014, 36 (05) : 721 - 729
[44] A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
Wang, Heming
Pandey, Ashutosh
Wang, Deliang
COMPUTER SPEECH AND LANGUAGE, 2025, 89
[45] MODEL-BASED SPEECH ENHANCEMENT USING SNR DEPENDENT MMSE ESTIMATION
Esch, Thomas
Vary, Peter
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4652 - 4655
[46] A DNN-HMM Approach to Non-negative Matrix Factorization Based Speech Enhancement
Wang, Ziteng
Li, Xu
Wang, Xiaofei
Fu, Qiang
Yan, Yonghong
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3763 - 3767
[47] MASKING AND INPAINTING: A TWO-STAGE SPEECH ENHANCEMENT APPROACH FOR LOW SNR AND NON-STATIONARY NOISE
Hao, Xiang
Su, Xiangdong
Wen, Shixue
Wang, Zhiyu
Pan, Yiqian
Bao, Feilong
Chen, Wei
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6959 - 6963
[48] A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion
Shahrebabaki, Abdolreza Sabzi
Siniscalchi, Sabato Marco
Salvi, Giampiero
Svendsen, Torbjorn
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[49] PROGRESSIVE MULTI-TARGET NETWORK BASED SPEECH ENHANCEMENT WITH SNR-PRESELECTION FOR ROBUST SPEAKER DIARIZATION
Sun, Lei
Du, Jun
Zhang, Xueyang
Gao, Tian
Fang, Xin
Lee, Chin-Hui
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7099 - 7103
[50] Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions
Breithaupt, Colin
Martin, Rainer
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (02): : 277 - 289

← 1 2 3 4 5 →