Intelligibility of reverberant noisy speech with ideal binary masking

被引:32
作者
Roman, Nicoleta [1 ]
Woodruff, John [2 ]
机构
[1] Ohio State Univ, Dept Math Stat & Comp Sci, Lima, OH 45804 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
EARLY REFLECTIONS; SEGREGATION; PERCEPTION; HEARING; RECOGNITION; SEPARATION;
D O I
10.1121/1.3631668
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners. (C) 2011 Acoustical Society of America. [DOI: 10.1121/1.3631668]
引用
收藏
页码:2153 / 2161
页数:9
相关论文
共 42 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]   Determination of the potential benefit of time-frequency gain manipulation [J].
Anzalone, Michael C. ;
Calandruccio, Lauren ;
Doherty, Karen A. ;
Carney, Laurel H. .
EAR AND HEARING, 2006, 27 (05) :480-492
[3]  
Assmann PF., 2004, SPEECH PROCESSING AU, P231, DOI [10.1007/0-387-21575-1_5, DOI 10.1007/0-387-21575-1_5]
[4]   On the importance of early reflections for speech in rooms [J].
Bradley, JS ;
Sato, H ;
Picard, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 113 (06) :3233-3244
[5]  
Bregman A. S., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI [DOI 10.7551/MITPRESS/1486.001.0001, DOI 10.1121/1.408434]
[6]  
Bronkhorst AW, 2000, ACUSTICA, V86, P117
[7]   Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].
Brungart, Douglas S. ;
Chang, Peter S. ;
Simpson, Brian D. ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018
[8]   Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers [J].
Brungart, Douglas S. ;
Chang, Peter S. ;
Simpson, Brian D. ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (06) :4006-4022
[9]   Informational and energetic masking effects in the perception of two simultaneous talkers [J].
Brungart, DS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (03) :1101-1109
[10]   Robust automatic speech recognition with missing and unreliable acoustic data [J].
Cooke, M ;
Green, P ;
Josifovski, L ;
Vizinho, A .
SPEECH COMMUNICATION, 2001, 34 (03) :267-285