Modeling Binaural Unmasking of Speech Using a Blind Binaural Processing Stage

被引:15
作者
Hauth, Christopher F. [1 ,2 ]
Berning, Simon C.
Kollmeier, Birger
Brand, Thomas
机构
[1] Carl von Ossietzky Univ Oldenburg, Med Phys, D-26111 Oldenburg, Germany
[2] Carl von Ossietzky Univ Oldenburg, Cluster Excellence Hearing4All, D-26111 Oldenburg, Germany
关键词
speech recognition thresholds; binaural processing; auditory model; speech intelligibility prediction; TO-NOISE RATIOS; LEVEL DIFFERENCES; LISTENING EFFORT; NORMAL-HEARING; INTELLIGIBILITY; PREDICTION; EQUALIZATION; MASKING; REDUCTION; THRESHOLD;
D O I
10.1177/2331216520975630
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The equalization cancellation model is often used to predict the binaural masking level difference. Previously its application to speech in noise has required separate knowledge about the speech and noise signals to maximize the signal-to-noise ratio (SNR). Here, a novel, blind equalization cancellation model is introduced that can use the mixed signals. This approach does not require any assumptions about particular sound source directions. It uses different strategies for positive and negative SNRs, with the switching between the two steered by a blind decision stage utilizing modulation cues. The output of the model is a single-channel signal with enhanced SNR, which we analyzed using the speech intelligibility index to compare speech intelligibility predictions. In a first experiment, the model was tested on experimental data obtained in a scenario with spatially separated target and masker signals. Predicted speech recognition thresholds were in good agreement with measured speech recognition thresholds with a root mean square error less than 1 dB. A second experiment investigated signals at positive SNRs, which was achieved using time compressed and low-pass filtered speech. The results demonstrated that binaural unmasking of speech occurs at positive SNRs and that the modulation-based switching strategy can predict the experimental results.
引用
收藏
页数:16
相关论文
共 55 条
[1]  
Andersen AH, 2017, INT CONF ACOUST SPEE, P5085, DOI 10.1109/ICASSP.2017.7953125
[2]   Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech [J].
Andersen, Asger Heidemann ;
de Haan, Jan Mark ;
Tan, Zheng-Hua ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) :1908-1920
[3]  
[Anonymous], 2018, PRAAT DOING PHONETIC
[4]  
[Anonymous], 1997, S351997 ANSI
[5]  
[Anonymous], 2001, The CIPIC HRTF database, DOI DOI 10.1109/ASPAA.2001.969552
[6]   Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners [J].
Beutelmann, Rainer ;
Brand, Thomas .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (01) :331-342
[7]   Revision, extension, and evaluation of a binaural speech intelligibility model [J].
Beutelmann, Rainer ;
Brand, Thomas ;
Kollmeier, Birger .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (04) :2479-2497
[8]   Prediction of binaural speech intelligibility with frequency-dependent interaural phase differences [J].
Beutelmann, Rainer ;
Brand, Thomas ;
Kollmeier, Birger .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 126 (03) :1359-1368
[9]  
Boersma P., 2018, Glot International
[10]   Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests [J].
Brand, T ;
Kollmeier, B .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (06) :2801-2810