Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

被引:0
作者
Nie, Shuai [1 ,3 ]
Liang, Shan [1 ]
Liu, Bin [1 ,3 ]
Zhang, Yaping [1 ,3 ]
Liu, Wenju [1 ]
Tao, Jianhua [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
基金
国家重点研发计划;
关键词
speech enhancement; noise tracking; deep learning; signal processing; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Noise statistics and speech spectrum characteristics are the essential information for the single channel speech enhancement. The signal processing-based methods mainly rely on noise statistics estimation. They perform very well for stationary noise, but have remained difficult to cope with non-stationary noise. While the deep leaming-based methods mainly focus on the perception on the spectrum characteristics of speech and have a capacity in dealing with non-stationary noise. However, the performance would degrade dramatically for the unseen noise types, which could be due to the over-reliance on data and the ignorance to domain knowledge of signal process. Obviously, the hybrid signal processing/deep learning scheme may be a smart alternative. In this paper, we incorporate the powerful perceptual capabilities of deep learning in the conventional speech enhancement framework. Deep learning is used to estimate the speech presence probability and the update factor of noise statistics, which are then integrated into the Wiener filter-based speech enhancement structure to enhance the desired speech. All components are jointly optimized by a spectrum approximation objective. Systematic experiments on CHiME-4 and NOISEX-92 demonstrate the proposed hybrid signal processing/deep learning approach to noise suppression in noise-unmatched and noise-matched conditions.
引用
收藏
页码:3219 / 3223
页数:5
相关论文
共 50 条
[41]   An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement [J].
Nossier, Soha A. ;
Wall, Julie ;
Moniri, Mansour ;
Glackin, Cornelius ;
Cannings, Nigel .
ELECTRONICS, 2021, 10 (01) :1-32
[42]   SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement [J].
Gao, Tian ;
Du, Jun ;
Dai, Li-Rong ;
Lee, Chin-Hui .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3713-3717
[43]   DEEP RESIDUAL ECHO SUPPRESSION AND NOISE REDUCTION: A MULTI-INPUT FCRN APPROACH IN A HYBRID SPEECH ENHANCEMENT SYSTEM [J].
Franzen, Jan ;
Fingscheidt, Tim .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :666-670
[44]   A Regression Approach to Speech Enhancement Based on Deep Neural Networks [J].
Xu, Yong ;
Du, Jun ;
Dai, Li-Rong ;
Lee, Chin-Hui .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) :7-19
[45]   Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning [J].
Abdullah, Salinna ;
Zamani, Majid ;
Demosthenous, Andreas .
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS, 2024, 5 :141-152
[46]   Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains [J].
Nossier, Soha A. ;
Wall, Julie ;
Moniri, Mansour ;
Glackin, Cornelius ;
Cannings, Nigel .
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[47]   A TRANSFER LEARNING AND PROGRESSIVE STACKING APPROACH TO REDUCING DEEP MODEL SIZES WITH AN APPLICATION TO SPEECH ENHANCEMENT [J].
Wang, Sicheng ;
Li, Kehuang ;
Huang, Zhen ;
Siniscalchi, Sabato Marco ;
Lee, Chin-Hui .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :5575-5579
[48]   A review of deep learning techniques for speech processing [J].
Mehrish, Ambuj ;
Majumder, Navonil ;
Bharadwaj, Rishabh ;
Mihalcea, Rada ;
Poria, Soujanya .
INFORMATION FUSION, 2023, 99
[49]   Survey of Deep Learning Paradigms for Speech Processing [J].
Kishor Barasu Bhangale ;
Mohanaprasad Kothandaraman .
Wireless Personal Communications, 2022, 125 :1913-1949
[50]   Survey of Deep Learning Paradigms for Speech Processing [J].
Bhangale, Kishor Barasu ;
Kothandaraman, Mohanaprasad .
WIRELESS PERSONAL COMMUNICATIONS, 2022, 125 (02) :1913-1949