A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments

被引：19

作者：

Gao, Tian ^{[1
]}

Du, Jun ^{[1
]}

Dai, Li-Rong ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

SPEECH COMMUNICATION | 2017年 / 95卷

基金：

中国国家自然科学基金;

关键词：

Speaker-dependent speech processing; Speech enhancement; Speech separation; Deep neural network; Low SNR; NEURAL-NETWORKS; DEEP; ALGORITHM; NOISE;

D O I：

10.1016/j.specom.2017.10.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a unified speech enhancement framework to jointly handle both background noise and interfering speech in a speaker-dependent scenario based on deep neural networks (DNNs). We first explore speaker-dependent speech enhancement that can significantly improve system performance over speaker-independent systems. Next, we consider interfering speech as one noise type, thus a speaker-dependent DNN system can be adopted for both speech enhancement and separation. Experimental results demonstrate that the proposed unified system can achieve comparable performances to specific systems where only noise or speech interference is present. Furthermore, much better results can be obtained over individual enhancement or separation systems in mixed background noise and interfering speech scenarios. The training data for the two specific tasks are also found to be complementary. Finally, an ensemble learning-based framework is employed to further improve the system performance in low signal-to-noise ratio (SNR) environments. A voice activity detection (VAD) DNN and an ideal ratio mask (IRM) DNN are investigated to provide prior information to integrate two sub-modules at frame level and time-frequency level, respectively. The results demonstrate the effectiveness of the ensemble method in low SNR environments.

引用

页码：28 / 39

页数：12

共 50 条

[31] Ideal ratio mask estimation using supervised DNN approach for target speech signal enhancement
Selvaraj, Poovarasan
Chandra, E.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (03) : 1869 - 1883
[32] A DNN REGRESSION APPROACH TO SPEECH ENHANCEMENT BY ARTIFICIAL BANDWIDTH EXTENSION
Abel, Johannes
Fingscheidt, Tim
2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 219 - 223
[33] Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception
Ceolini, Enea
Hjortkjaer, Jens
De Wong, Daniel
O'Sullivan, James
Raghavan, Vinay S.
Herrero, Jose
Mehta, Ashesh D.
Liu, Shih-Chii
Mesgarani, Nima
NEUROIMAGE, 2020, 223
[34] A novel approach to a robust a Priori SNR estimator in speech enhancement
Park, Yun-Sik
Chang, Joon-Hyuk
IEICE TRANSACTIONS ON COMMUNICATIONS, 2007, E90B (08) : 2182 - 2185
[35] NOISE RETF ESTIMATION AND REMOVAL FOR LOW SNR SPEECH ENHANCEMENT
Birnie, Lachlan
Samarasinghe, Prasanga
Abhayapala, Thushara
Grixti-Cheng, Daniel
2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
[36] A Novel Approach to a Robust A Priori SNR Estimator in Speech Enhancement
Park, Yun-Sik
Chang, Joon-Hyuk
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (08): : 383 - 388
[37] Simultaneous detection and estimation approach for speech enhancement
Abramson, Ari
Cohen, Israel
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2348 - 2359
[38] SNR-Progressive Model With Harmonic Compensation for Low-SNR Speech Enhancement
Hou, Zhongshu
Lei, Tong
Hu, Qinwen
Cao, Zhanzhong
Lu, Jing
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 476 - 480
[39] Two speaker speech separation by LP residual weighting and harmonics enhancement
Krishnamoorthy P.
Mahadeva Prasanna S.R.
International Journal of Speech Technology, 2010, 13 (3) : 117 - 139
[40] SLOGD: SPEAKER LOCATION GUIDED DEFLATION APPROACH TO SPEECH SEPARATION
Sivasankaran, Sunit
Vincent, Emmanuel
Fohr, Dominique
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6409 - 6413

← 1 2 3 4 5 →