A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments

被引:19
|
作者
Gao, Tian [1 ]
Du, Jun [1 ]
Dai, Li-Rong [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Speaker-dependent speech processing; Speech enhancement; Speech separation; Deep neural network; Low SNR; NEURAL-NETWORKS; DEEP; ALGORITHM; NOISE;
D O I
10.1016/j.specom.2017.10.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a unified speech enhancement framework to jointly handle both background noise and interfering speech in a speaker-dependent scenario based on deep neural networks (DNNs). We first explore speaker-dependent speech enhancement that can significantly improve system performance over speaker-independent systems. Next, we consider interfering speech as one noise type, thus a speaker-dependent DNN system can be adopted for both speech enhancement and separation. Experimental results demonstrate that the proposed unified system can achieve comparable performances to specific systems where only noise or speech interference is present. Furthermore, much better results can be obtained over individual enhancement or separation systems in mixed background noise and interfering speech scenarios. The training data for the two specific tasks are also found to be complementary. Finally, an ensemble learning-based framework is employed to further improve the system performance in low signal-to-noise ratio (SNR) environments. A voice activity detection (VAD) DNN and an ideal ratio mask (IRM) DNN are investigated to provide prior information to integrate two sub-modules at frame level and time-frequency level, respectively. The results demonstrate the effectiveness of the ensemble method in low SNR environments.
引用
收藏
页码:28 / 39
页数:12
相关论文
共 50 条
  • [1] A UNIFIED SPEAKER-DEPENDENT SPEECH SEPARATION AND ENHANCEMENT SYSTEM BASED ON DEEP NEURAL NETWORKS
    Gao, Tian
    Du, Jun
    Xu, Li
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 687 - 691
  • [2] Low-SNR, Speaker-Dependent Speech Enhancement using GMMs and MFCCs
    Boucheron, Laura E.
    De Leon, Phillip L.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 574 - 577
  • [3] Combining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation
    Ming, Ji
    Hazen, Timothy J.
    Glass, James R.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 93 - +
  • [4] An Approach for Speech Enhancement in Low SNR Environments using Granular Speaker Embedding
    Saha, Jayasree
    Mukhopadhyay, Rudrabha
    Agrawal, Aparna
    Jain, Surabhi
    Jawahar, C. V.
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 325 - 331
  • [5] Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation
    Ming, Ji
    Hazen, Timothy J.
    Glass, James R.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01) : 67 - 76
  • [6] Low SNR speech enhancement with DNN based phase estimation
    Samba Raju Chiluveru
    Manoj Tripathy
    International Journal of Speech Technology, 2019, 22 : 283 - 292
  • [7] Low SNR speech enhancement with DNN based phase estimation
    Chiluveru, Samba Raju
    Tripathy, Manoj
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (01) : 283 - 292
  • [8] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
    Thomsen, Nicolai Baek
    Thomsen, Dennis Alexander Lehmann
    Tan, Zheng-Hua
    Lindberg, Borge
    Jensen, Soren Holdt
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
  • [9] Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments
    Gao, Tian
    Du, Jun
    Xu, Yong
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 75 - 82
  • [10] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,