Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

被引:156
|
作者
Jorgensen, Soren [1 ]
Dau, Torsten [1 ]
机构
[1] Tech Univ Denmark, Dept Elect Engn, Ctr Appl Hearing Res, DK-2800 Lyngby, Denmark
来源
关键词
MASKING-LEVEL DIFFERENCES; AMPLITUDE-MODULATION; RECEPTION THRESHOLD; TRANSMISSION INDEX; TEMPORAL ENVELOPE; ROOM ACOUSTICS; COMPRESSION; SPECTRUM; RECOGNITION; INTENSITY;
D O I
10.1121/1.3621502
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNRenv, at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. [DOI: 10.1121/1.3621502]
引用
收藏
页码:1475 / 1487
页数:13
相关论文
共 50 条
  • [1] Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility
    Jorgensen, Soren
    Decorsiere, Remi
    Dau, Torsten
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (03): : 1401 - 1410
  • [2] Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain
    Chabot-Leclerc, Alexandre
    MacDonald, Ewen N.
    Dau, Torsten
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (01): : 192 - 205
  • [3] The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
    Dubbelboer, Finn
    Houtgast, Tarnmo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (06): : 3937 - 3946
  • [4] The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
    Dubbelboer, Finn
    Houtgast, Tammo
    Journal of the Acoustical Society of America, 2009, 124 (06): : 3937 - 3946
  • [5] On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility
    Bradley, JS
    Reich, RD
    Norcross, SG
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (04): : 1820 - 1828
  • [6] On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility
    Bradley, J.S.
    Reich, R.D.
    Norcross, S.G.
    Journal of the Acoustical Society of America, 1999, 106 (4 pt 1):
  • [7] Speech intelligibility prediction based on modulation frequency-selective processing
    Relano-Iborra, Helia
    Dau, Torsten
    HEARING RESEARCH, 2022, 426
  • [9] SIGNAL-TO-NOISE RATIO ENHANCEMENT IN FREQUENCY-MODULATION SPECTROMETERS BY DIGITAL SIGNAL-PROCESSING
    RIRIS, H
    CARLISLE, CB
    WARREN, RE
    COOPER, DE
    OPTICS LETTERS, 1994, 19 (02) : 144 - 146
  • [10] Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio
    Yamamoto, Katsuhiko
    Irino, Toshio
    Matsui, Toshie
    Araki, Shako
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2949 - 2953