Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction

被引:188
作者
Li, Ning [1 ]
Loizou, Philipos C. [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Engn, Richardson, TX 75083 USA
关键词
D O I
10.1121/1.2832617
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type, and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units, Performance plateaued near 100% correct for SNR thresholds ranging from -20 to 5 dB. The existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listener's attention to where the target is and enables them to segregate speech effectively in multitalker environments. (C) 2008 Acoustical Society of America.
引用
收藏
页码:1673 / 1682
页数:10
相关论文
共 30 条
[1]   Speech separation: Further insights from recordings of event-related brain potentials in humans [J].
Alain, C .
SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, :13-30
[2]  
Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.7551/MITPRESS/1486.001.0001, 10.1121/1.408434, DOI 10.1121/1.408434]
[3]  
[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
[4]   Determination of the potential benefit of time-frequency gain manipulation [J].
Anzalone, Michael C. ;
Calandruccio, Lauren ;
Doherty, Karen A. ;
Carney, Laurel H. .
EAR AND HEARING, 2006, 27 (05) :480-492
[5]   Visually-guided attention enhances target identification in a complex auditory scene [J].
Best, Virginia ;
Ozmeral, Erol J. ;
Shinn-Cunningham, Barbara G. .
JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2007, 8 (02) :294-304
[6]   A speech corpus for multitalker communications research [J].
Bolia, RS ;
Nelson, WT ;
Ericson, MA ;
Simpson, BD .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (02) :1065-1066
[7]   Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].
Brungart, Douglas S. ;
Chang, Peter S. ;
Simpson, Brian D. ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018
[8]   Informational and energetic masking effects in the perception of multiple simultaneous talkers [J].
Brungart, DS ;
Simpson, BD ;
Ericson, MA ;
Scott, KR .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (05) :2527-2538
[9]   Informational and energetic masking effects in the perception of two simultaneous talkers [J].
Brungart, DS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (03) :1101-1109
[10]   A glimpsing model of speech perception in noise [J].
Cooke, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) :1562-1573