A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments

被引:19
作者
Gao, Tian [1 ]
Du, Jun [1 ]
Dai, Li-Rong [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Speaker-dependent speech processing; Speech enhancement; Speech separation; Deep neural network; Low SNR; NEURAL-NETWORKS; DEEP; ALGORITHM; NOISE;
D O I
10.1016/j.specom.2017.10.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a unified speech enhancement framework to jointly handle both background noise and interfering speech in a speaker-dependent scenario based on deep neural networks (DNNs). We first explore speaker-dependent speech enhancement that can significantly improve system performance over speaker-independent systems. Next, we consider interfering speech as one noise type, thus a speaker-dependent DNN system can be adopted for both speech enhancement and separation. Experimental results demonstrate that the proposed unified system can achieve comparable performances to specific systems where only noise or speech interference is present. Furthermore, much better results can be obtained over individual enhancement or separation systems in mixed background noise and interfering speech scenarios. The training data for the two specific tasks are also found to be complementary. Finally, an ensemble learning-based framework is employed to further improve the system performance in low signal-to-noise ratio (SNR) environments. A voice activity detection (VAD) DNN and an ideal ratio mask (IRM) DNN are investigated to provide prior information to integrate two sub-modules at frame level and time-frequency level, respectively. The results demonstrate the effectiveness of the ensemble method in low SNR environments.
引用
收藏
页码:28 / 39
页数:12
相关论文
共 50 条
  • [41] A Survey on Low-Latency DNN-Based Speech Enhancement
    Drgas, Szymon
    SENSORS, 2023, 23 (03)
  • [42] Simultaneous Speech Detection and Magnitude Squared Spectrum Estimation Approach for Speech Enhancement
    Han, Ruirui
    Ou, Shifeng
    Liu, Wei
    Chen, Chen
    Zhang, Shuo
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 281 - 285
  • [43] Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain
    Lee, Yun-Kyung
    Park, Jeon Gue
    Lee, Yun Keun
    Kwon, Oh-Wook
    ETRI JOURNAL, 2014, 36 (05) : 721 - 729
  • [44] A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
    Wang, Heming
    Pandey, Ashutosh
    Wang, Deliang
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [45] MODEL-BASED SPEECH ENHANCEMENT USING SNR DEPENDENT MMSE ESTIMATION
    Esch, Thomas
    Vary, Peter
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4652 - 4655
  • [46] A DNN-HMM Approach to Non-negative Matrix Factorization Based Speech Enhancement
    Wang, Ziteng
    Li, Xu
    Wang, Xiaofei
    Fu, Qiang
    Yan, Yonghong
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3763 - 3767
  • [47] MASKING AND INPAINTING: A TWO-STAGE SPEECH ENHANCEMENT APPROACH FOR LOW SNR AND NON-STATIONARY NOISE
    Hao, Xiang
    Su, Xiangdong
    Wen, Shixue
    Wang, Zhiyu
    Pan, Yiqian
    Bao, Feilong
    Chen, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6959 - 6963
  • [48] A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion
    Shahrebabaki, Abdolreza Sabzi
    Siniscalchi, Sabato Marco
    Salvi, Giampiero
    Svendsen, Torbjorn
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [49] PROGRESSIVE MULTI-TARGET NETWORK BASED SPEECH ENHANCEMENT WITH SNR-PRESELECTION FOR ROBUST SPEAKER DIARIZATION
    Sun, Lei
    Du, Jun
    Zhang, Xueyang
    Gao, Tian
    Fang, Xin
    Lee, Chin-Hui
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7099 - 7103
  • [50] Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions
    Breithaupt, Colin
    Martin, Rainer
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (02): : 277 - 289