Non-intrusive binaural speech recognition prediction for hearing aid processing

被引：0

作者：

Rossbach, Jana ^{[1
]}

Westhausen, Nils L. ^{[1
]}

Kayser, Hendrik ^{[2
]}

Meyer, Bernd T. ^{[1
]}

机构：

[1] Carl von Ossietzky Univ Oldenburg, Commun Acoust & Cluster Excellence Hearing4all, Oldenburg, Germany

[2] Carl von Ossietzky Univ Oldenburg, Auditory Signal Proc & Hearing Devices & Cluster E, Oldenburg, Germany

来源：

SPEECH COMMUNICATION | 2025年 / 170卷

关键词：

Speech recognition prediction; Binaural; Non-intrusive; Deep neural network; INTELLIGIBILITY; CHALLENGE; NOISE;

D O I：

10.1016/j.specom.2025.103202

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing- impaired (HI), aided users are applicable to automatically select the best algorithm. To this end, SR experiments are conducted with 19 HI subjects who are aided with an open-source HA. Listeners' SR is measured in virtual, complex acoustic scenes with two distinct noise conditions using the different speech enhancement strategies implemented in this HA. For model-based selection, we apply a PHOneme-based Binaural Intelligibility model (PHOBI) based on our previous work and extended with a component for simulating hearing loss. The non- intrusive model utilizes a deep neural network to predict phone probabilities; the deterioration of these phone representations in the presence of noise or generally signal degradation is quantified and used as model output. PHOBI model is trained with 960 h of English speech signals, a broad range of noise signals and room impulse responses. The performance of model-based algorithm selection is measured with two metrics: (i) Its ability to rank the HA algorithms in the order of subjective SR results and (ii) the SR difference between the measured best algorithm and the model-based selection (4SR). Results are compared to selections obtained with one non-intrusive and two intrusive models. PHOBI outperforms the non-intrusive and one of the intrusive models in both noise conditions, achieving significantly higher correlations (r = 0.63 and 0.80). 4 SR scores are significantly lower (better) compared to the non-intrusive baseline (3.5% and 4.6% against 8.6% and 9.8%, respectively). The results in terms of 4 SR between PHOBI and the intrusive models are statistically not different, although PHOBI operates on the observed signal alone and does not require a clean reference signal.

引用

页数：10

共 70 条

[1] Akeroyd M.A., 2020, J. Acoust. Soc. Am., V148, P2711, DOI [10.1121/1.5147514, DOI 10.1121/1.5147514]
[2] Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions
Andersen, Asger Heidemann
de Haan, Jan Mark
Tan, Zheng-Hua
Jensen, Jesper
[J]. SPEECH COMMUNICATION, 2018, 102 : 1 - 13
[3] Andersen AH, 2017, INT CONF ACOUST SPEE, P5085, DOI 10.1109/ICASSP.2017.7953125
[4] Barker J, 2024, INT CONF ACOUST SPEE, P11551, DOI 10.1109/ICASSP48485.2024.10446441
[5] The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction
Barker, Jon
Akeroyd, Michael A.
Cox, Trevor J.
Culling, John F.
Firth, Jennifer
Graetzer, Simone
Griffiths, Holly
Harris, Lara
Viveros-Munoz, Rhoddy
Naylor, Graham
Podwinska, Zuzanna
Porter, Eszter
[J]. INTERSPEECH 2022, 2022, : 3508 - 3512
[6] Baumgartel R.M., 2015, Trends Hear, V19
[7] BBC, BBC Sound Effects
[8] Standard Audiograms for the IEC 60118-15 Measurement Procedure
Bisgaard, Nikolai
Vlaming, Marcel S. M. G.
Dahlquist, Martin
[J]. TRENDS IN AMPLIFICATION, 2010, 14 (02): : 113 - 120
[9] A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing
Breithaupt, Colin
Gerkmann, Timo
Martin, Rainer
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4897 - 4900
[10] Bronkhorst AW, 2000, ACUSTICA, V86, P117

← 1 2 3 4 5 6 7 →