Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

被引：156

作者：

Jorgensen, Soren ^{[1
]}

Dau, Torsten ^{[1
]}

机构：

[1] Tech Univ Denmark, Dept Elect Engn, Ctr Appl Hearing Res, DK-2800 Lyngby, Denmark

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2011年 / 130卷 / 03期

关键词：

MASKING-LEVEL DIFFERENCES; AMPLITUDE-MODULATION; RECEPTION THRESHOLD; TRANSMISSION INDEX; TEMPORAL ENVELOPE; ROOM ACOUSTICS; COMPRESSION; SPECTRUM; RECOGNITION; INTENSITY;

D O I：

10.1121/1.3621502

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNRenv, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. [DOI: 10.1121/1.3621502]

引用

页码：1475 / 1487

页数：13

共 9 条

[1] Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility
Jorgensen, Soren
Decorsiere, Remi
Dau, Torsten
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (03) : 1401 - 1410
[2] Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain
Chabot-Leclerc, Alexandre
MacDonald, Ewen N.
Dau, Torsten
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (01) : 192 - 205
[3] Speech intelligibility prediction based on modulation frequency-selective processing
Relano-Iborra, Helia
Dau, Torsten
HEARING RESEARCH, 2022, 426
[4] Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain
Relano-Iborra, Helia
May, Tobias
Zaar, Johannes
Scheidiger, Christoph
Dau, Torsten
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (04) : 2670 - 2679
[5] Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech
Jokinen, Emma
Yrttiaho, Santeri
Pulakka, Hannu
Vainio, Martti
Alku, Paavo
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (06) : 3990 - 4001
[6] The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
Liang, Shan
Liu, Wenju
Jiang, Wei
Xue, Wei
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (05) : EL452 - EL458
[7] Accuracy of speech transmission index predictions based on the reverberation time and signal-to-noise ratio
Galbrun, Laurent
Kitapci, Kivanc
APPLIED ACOUSTICS, 2014, 81 : 1 - 14
[8] A FEATURE STUDY FOR CLASSIFICATION-BASED SPEECH SEPARATION AT VERY LOW SIGNAL-TO-NOISE RATIO
Chen, Jitong
Wang, Yuxuan
Wang, DeLiang
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] Blind Signal-to-Noise Ratio Estimation of Speech Based on Vector Quantizer Classifiers and Decision Level Fusion
Ondusko, Russell
Marbach, Matthew
Ramachandran, Ravi P.
Head, Linda M.
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2017, 89 (02): : 335 - 345

← 1 →