Non-intrusive binaural speech recognition prediction for hearing aid processing

被引：0

作者：

Rossbach, Jana ^{[1
]}

Westhausen, Nils L. ^{[1
]}

Kayser, Hendrik ^{[2
]}

Meyer, Bernd T. ^{[1
]}

机构：

[1] Carl von Ossietzky Univ Oldenburg, Commun Acoust & Cluster Excellence Hearing4all, Oldenburg, Germany

[2] Carl von Ossietzky Univ Oldenburg, Auditory Signal Proc & Hearing Devices & Cluster E, Oldenburg, Germany

来源：

SPEECH COMMUNICATION | 2025年 / 170卷

关键词：

Speech recognition prediction; Binaural; Non-intrusive; Deep neural network; INTELLIGIBILITY; CHALLENGE; NOISE;

D O I：

10.1016/j.specom.2025.103202

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing- impaired (HI), aided users are applicable to automatically select the best algorithm. To this end, SR experiments are conducted with 19 HI subjects who are aided with an open-source HA. Listeners' SR is measured in virtual, complex acoustic scenes with two distinct noise conditions using the different speech enhancement strategies implemented in this HA. For model-based selection, we apply a PHOneme-based Binaural Intelligibility model (PHOBI) based on our previous work and extended with a component for simulating hearing loss. The non- intrusive model utilizes a deep neural network to predict phone probabilities; the deterioration of these phone representations in the presence of noise or generally signal degradation is quantified and used as model output. PHOBI model is trained with 960 h of English speech signals, a broad range of noise signals and room impulse responses. The performance of model-based algorithm selection is measured with two metrics: (i) Its ability to rank the HA algorithms in the order of subjective SR results and (ii) the SR difference between the measured best algorithm and the model-based selection (4SR). Results are compared to selections obtained with one non-intrusive and two intrusive models. PHOBI outperforms the non-intrusive and one of the intrusive models in both noise conditions, achieving significantly higher correlations (r = 0.63 and 0.80). 4 SR scores are significantly lower (better) compared to the non-intrusive baseline (3.5% and 4.6% against 8.6% and 9.8%, respectively). The results in terms of 4 SR between PHOBI and the intrusive models are statistically not different, although PHOBI operates on the observed signal alone and does not require a clean reference signal.

引用

页数：10

共 70 条

[51] Using a blind EC mechanism for modelling the interaction between binaural and temporal speech processing
Roettges, Saskia
Hauth, Christopher F.
Rennies, Jan
Brand, Thomas
[J]. ACTA ACUSTICA, 2022, 6
[52] Multilingual non-intrusive binaural intelligibility prediction based on phone classification
Rossbach, Jana
Wagener, Kirsten C.
Meyer, Bernd T.
[J]. COMPUTER SPEECH AND LANGUAGE, 2025, 89
[53] NON-INTRUSIVE BINAURAL PREDICTION OF SPEECH INTELLIGIBILITY BASED ON PHONEME CLASSIFICATION
Rossbach, Jana
Roettges, Saskia
Hauth, Christopher F.
Brand, Thomas
Meyer, Bernd T.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 396 - 400
[54] Santos JF, 2014, 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), P55, DOI 10.1109/IWAENC.2014.6953337
[55] Schadler M.R., 2020, P DAGA DTSCH GES FU, P908
[56] Tinnitus with a Normal Audiogram: Physiological Evidence for Hidden Hearing Loss and Computational Model
Schaette, Roland
McAlpine, David
[J]. JOURNAL OF NEUROSCIENCE, 2011, 31 (38) : 13452 - 13457
[57] The performance of an automatic acoustic-based program classifier compared to hearing aid users' manual selection of listening programs
Searchfield, Grant D.
Linford, Tania
Kobayashi, Kei
Crowhen, David
Latzel, Matthias
[J]. INTERNATIONAL JOURNAL OF AUDIOLOGY, 2018, 57 (03) : 201 - 212
[58] Predicting speech intelligibility with deep neural networks
Spille, Constantin
Ewert, Stephan D.
Kollmeier, Birger
Meyer, Bernd T.
[J]. COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 51 - 66
[59] An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech
Taal, Cees H.
Hendriks, Richard C.
Heusdens, Richard
Jensen, Jesper
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07): : 2125 - 2136
[60] Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction
Tu, Zehai
Ma, Ning
Barker, Jon
[J]. INTERSPEECH 2022, 2022, : 3493 - 3497

← 1 2 3 4 5 6 7 →