A UNIFIED SPEAKER-DEPENDENT SPEECH SEPARATION AND ENHANCEMENT SYSTEM BASED ON DEEP NEURAL NETWORKS

被引:0
作者
Gao, Tian [1 ]
Du, Jun [1 ]
Xu, Li [2 ]
Liu, Cong [2 ]
Dai, Li-Rong [1 ]
Lee, Chin-Hui [3 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] iFlytek Co Ltd, iFlytek Res, Hefei, Anhui, Peoples R China
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING | 2015年
关键词
speech enhancement; speech separation; speaker-dependent; deep neural networks; supervised method; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech enhancement and speech separation are important frontends of many speech processing systems. In real tasks, the background noises are often mixed with some human voice interferences. In this paper, we explore a framework to unify speech enhancement and speech separation for a speaker-dependent scenario based on deep neural networks (DNNs). Using a supervised method, DNN is adopted to directly model a nonlinear mapping function between noisy and clean speech signals. The signals of speaker interferers are considered as one type of universal noise signals in our framework. In order to be able to handle a wide range of additive noise in the real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is designed. Experimental results demonstrate that the proposed framework can get the comparable performances to those single speech enhancement or separation systems. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world conditions.
引用
收藏
页码:687 / 691
页数:5
相关论文
共 20 条
[1]  
[Anonymous], 2005, Speech Enhancement
[2]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[3]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[4]   Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475
[5]   Speech enhancement for non-stationary noise environments [J].
Cohen, I ;
Berdugo, B .
SIGNAL PROCESSING, 2001, 81 (11) :2403-2418
[6]  
Du J, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P569
[7]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445
[8]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[9]   A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation [J].
Hu, Guoning ;
Wang, DeLiang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08) :2067-2079
[10]   ENHANCEMENT AND BANDWIDTH COMPRESSION OF NOISY SPEECH [J].
LIM, JS ;
OPPENHEIM, AV .
PROCEEDINGS OF THE IEEE, 1979, 67 (12) :1586-1604