Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model

被引:68
作者
Juergens, Tim [1 ]
Brand, Thomas [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, D-26111 Oldenburg, Germany
关键词
INTELLIGIBILITY INDEX; IMPAIRED LISTENERS; RECEPTION THRESHOLD; CONFUSIONS; PERCEPTION;
D O I
10.1121/1.3224721
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers. (C) 2009 Acoustical Society of America. [DOI: 10.1121/1.3224721]
引用
收藏
页码:2635 / 2648
页数:14
相关论文
共 49 条
[1]  
[Anonymous], 1999, ANGEW STAT
[2]  
*ANSI, 1969, S351969 ANSI AC SOC
[3]  
ANSI, 1997, S351997 ANSI AC SOC
[4]   Modelling speaker intelligibility in noise [J].
Barker, Jon ;
Cooke, Martin .
SPEECH COMMUNICATION, 2007, 49 (05) :402-417
[5]   Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners [J].
Beutelmann, Rainer ;
Brand, Thomas .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (01) :331-342
[6]   Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests [J].
Brand, T ;
Kollmeier, B .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (06) :2801-2810
[7]  
Breebaart J, 2002, ACTA ACUST UNITED AC, V88, P110
[8]   Binaural processing model based on contralateral inhibition. I. Model structure [J].
Breebaart, J ;
van de Par, S ;
Kohlrausch, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (02) :1074-1088
[9]   Spectro-temporal modulation transfer functions and speech intelligibility [J].
Chi, TS ;
Gao, YJ ;
Guyton, MC ;
Ru, PW ;
Shamma, S .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05) :2719-2732
[10]  
CHRISTIANSEN TU, 2006, INT S HEAR 2006 CLOP, P517