NON-INTRUSIVE BINAURAL PREDICTION OF SPEECH INTELLIGIBILITY BASED ON PHONEME CLASSIFICATION

被引:11
作者
Rossbach, Jana [1 ,3 ]
Roettges, Saskia [2 ,3 ]
Hauth, Christopher F. [2 ,3 ]
Brand, Thomas [2 ,3 ]
Meyer, Bernd T. [1 ,3 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Commun Acoust, Oldenburg, Germany
[2] Carl von Ossietzky Univ Oldenburg, Med Phys, Oldenburg, Germany
[3] Cluster Excellence Hearing4all, Hannover, Germany
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
speech intelligibility prediction; binaural; non-intrusive; PERCEPTION; HEARING;
D O I
10.1109/ICASSP39728.2021.9413874
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities that degrade in the presence of noise and reverberation, which is quantified with an entropy-based measure. The model output is used to predict speech recognition thresholds, i.e., signal-to-noise ratio with 50% word recognition accuracy. It is compared to measured data obtained from eight normal-hearing listeners in acoustic scenarios with varying positions of localized maskers, different rooms and reverberation times. The model is non-intrusive; yet it produces a root mean squared error in the range of 0.6-2.1 dB, which is similar to results obtained with a reference model (0.3-1.8 dB) that uses oracle knowledge both in the frontend and in the backend stage.
引用
收藏
页码:396 / 400
页数:5
相关论文
共 20 条
[1]  
ANSI S3.5, 1997, AM NATL STANDARD MET
[2]   Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners [J].
Beutelmann, Rainer ;
Brand, Thomas .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (01) :331-342
[3]   A glimpsing model of speech perception in noise [J].
Cooke, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) :1562-1573
[4]   Subcomponent cues in binaural unmasking [J].
Culling, John F. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 129 (06) :3846-3855
[6]   Modeling Binaural Unmasking of Speech Using a Blind Binaural Processing Stage [J].
Hauth, Christopher F. ;
Berning, Simon C. ;
Kollmeier, Birger ;
Brand, Thomas .
TRENDS IN HEARING, 2020, 24
[7]  
Hermansky H, 2013, INT CONF ACOUST SPEE, P7423, DOI 10.1109/ICASSP.2013.6639105
[8]  
Hohmann V, 2002, ACTA ACUST UNITED AC, V88, P433
[9]   Single-ended prediction of listening effort using deep neural networks [J].
Huber, Rainer ;
Krueger, Melanie ;
Meyer, Bernd T. .
HEARING RESEARCH, 2018, 359 :40-49
[10]   The Hearing-Aid Speech Perception Index (HASPI) [J].
Kates, James M. ;
Arehart, Kathryn H. .
SPEECH COMMUNICATION, 2014, 65 :75-93