Non-intrusive binaural speech recognition prediction for hearing aid processing

被引：0

作者：

Rossbach, Jana ^{[1
]}

Westhausen, Nils L. ^{[1
]}

Kayser, Hendrik ^{[2
]}

Meyer, Bernd T. ^{[1
]}

机构：

[1] Carl von Ossietzky Univ Oldenburg, Commun Acoust & Cluster Excellence Hearing4all, Oldenburg, Germany

[2] Carl von Ossietzky Univ Oldenburg, Auditory Signal Proc & Hearing Devices & Cluster E, Oldenburg, Germany

来源：

SPEECH COMMUNICATION | 2025年 / 170卷

关键词：

Speech recognition prediction; Binaural; Non-intrusive; Deep neural network; INTELLIGIBILITY; CHALLENGE; NOISE;

D O I：

10.1016/j.specom.2025.103202

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing- impaired (HI), aided users are applicable to automatically select the best algorithm. To this end, SR experiments are conducted with 19 HI subjects who are aided with an open-source HA. Listeners' SR is measured in virtual, complex acoustic scenes with two distinct noise conditions using the different speech enhancement strategies implemented in this HA. For model-based selection, we apply a PHOneme-based Binaural Intelligibility model (PHOBI) based on our previous work and extended with a component for simulating hearing loss. The non- intrusive model utilizes a deep neural network to predict phone probabilities; the deterioration of these phone representations in the presence of noise or generally signal degradation is quantified and used as model output. PHOBI model is trained with 960 h of English speech signals, a broad range of noise signals and room impulse responses. The performance of model-based algorithm selection is measured with two metrics: (i) Its ability to rank the HA algorithms in the order of subjective SR results and (ii) the SR difference between the measured best algorithm and the model-based selection (4SR). Results are compared to selections obtained with one non-intrusive and two intrusive models. PHOBI outperforms the non-intrusive and one of the intrusive models in both noise conditions, achieving significantly higher correlations (r = 0.63 and 0.80). 4 SR scores are significantly lower (better) compared to the non-intrusive baseline (3.5% and 4.6% against 8.6% and 9.8%, respectively). The results in terms of 4 SR between PHOBI and the intrusive models are statistically not different, although PHOBI operates on the observed signal alone and does not require a clean reference signal.

引用

页数：10

共 70 条

[21] Grimm Giso, 2019, Zenodo
[22] A Toolbox for Rendering Virtual Acoustic Environments in the Context of Audiology
Grimm, Giso
Luberadzka, Joanna
Hohmann, Volker
[J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (03) : 566 - 578
[23] Increase and Subjective Evaluation of Feedback Stability in Hearing Aids by a Binaural Coherence-Based Noise Reduction Scheme
Grimm, Giso
Hohmann, Volker
Kollmeier, Birger
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1408 - 1419
[24] Haffner DE, 2024, INTERSPEECH, P4214, DOI 10.21437/Interspeech.2024-473
[25] Hall Edward., 1966, HIDDEN DIMENSION
[26] Signal processing in high-end hearing aids: State of the art, challenges, and future trends
Hamacher, V
Chalupper, J
Eggers, J
Fischer, E
Kornagel, U
Puder, H
Rass, U
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (18) : 2915 - 2929
[27] Modeling Binaural Unmasking of Speech Using a Blind Binaural Processing Stage
Hauth, Christopher F.
Berning, Simon C.
Kollmeier, Birger
Brand, Thomas
[J]. TRENDS IN HEARING, 2020, 24
[28] Evaluation of the Influence of Head Movement on Hearing Aid Algorithm Performance Using Acoustic Simulations
Hendrikse, Maartje M. E.
Grimm, Giso
Hohmann, Volker
[J]. TRENDS IN HEARING, 2020, 24
[29] Hermansky H, 2013, INT CONF ACOUST SPEE, P7423, DOI 10.1109/ICASSP.2013.6639105
[30] Huber R., 2018, P ITG C SPEECH COMM, P86

← 1 2 3 4 5 6 7 →