Ideal ratio mask estimation using deep neural networks for monaural speech segregation in noisy reverberant conditions

被引:22
作者
Li, Xu [1 ,2 ]
Li, Junfeng [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 1001090, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Xinjiang Lab Minor Speech & Language Informat Pro, Xinjiang, Peoples R China
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
中国国家自然科学基金;
关键词
speech segregation; deep neural networks; ideal ratio mask; INTELLIGIBILITY; BINARY; REFLECTIONS; ALGORITHM;
D O I
10.21437/Interspeech.2017-549
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monaural speech segregation is an important problem in robust speech processing and has been formulated as a supervised learning problem. In supervised learning methods, the ideal binary mask (IBM) is usually used as the target because of its simplicity and large speech intelligibility gains. Recently. the ideal ratio mask (IRM) has been found to improve the speech quality over the IBM. However, the IRM was originally defined in anechoic conditions and did not consider the effect of reverberation. In this paper, the IRM is extended to reverberant conditions where the direct sound and early reflections of target speech are regarded as the desired signal. Deep neural networks (DNNs) is employed to estimate the extended IRM in the noisy reverberant conditions. The estimated IRM is then applied to the noisy reverberant mixture for speech segregation. Experimental results show that the estimated IRM provides substantial improvements in speech intelligibility and speech quality over the unprocessed mixture signals under various noisy and reverberant conditions.
引用
收藏
页码:1203 / 1207
页数:5
相关论文
共 23 条
[1]  
[Anonymous], 1993, NASA STI RECON TECHN
[2]  
[Anonymous], 2001, ITU T RECOMMENDATION
[3]  
Assmann Peter, 2004, VVolume 18, P231
[4]   On the importance of early reflections for speech in rooms [J].
Bradley, JS ;
Sato, H ;
Picard, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 113 (06) :3233-3244
[5]   A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002
[6]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[7]   A classification based approach to speech segregation [J].
Han, Kun ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (05) :3475-3483
[8]   An algorithm to improve speech recognition in noise for hearing-impaired listeners [J].
Healy, Eric W. ;
Yoho, Sarah E. ;
Wang, Yuxuan ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04) :3029-3038
[9]  
Hummersone C, 2014, SIGNALS COMMUN TECHN, P349, DOI 10.1007/978-3-642-55016-4_12
[10]  
Jeub M., 2009, P INT C DIG SIGN PRO, P1