Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

被引:42
|
作者
Relano-Iborra, Helia [1 ]
May, Tobias [1 ]
Zaar, Johannes [1 ]
Scheidiger, Christoph [1 ]
Dau, Torsten [1 ]
机构
[1] Tech Univ Denmark, Dept Elect Engn, Hearing Syst Grp, DK-2800 Lyngby, Denmark
关键词
RECEPTION THRESHOLD; AMPLITUDE-MODULATION; TRANSMISSION INDEX; FREQUENCY; NOISE; MASKING; MODEL;
D O I
10.1121/1.4964505
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [ mr-sEPSM; Jorgensen, Ewert, and Dau ( 2013). J. Acoust. Soc. Am. 134( 1), 436-446] with a correlation back end inspired by the short-time objective intelligibility measure [ STOI; Taal, Hendriks, Heusdens, and Jensen ( 2011). IEEE Trans. Audio Speech Lang. Process. 19( 7), 2125-2136]. This "hybrid" model, named sEPSM(corr), is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation ( ITFS). The model shows a broader predictive range than both the original mr-sEPSM ( which fails in the phase-jitter and ITFS conditions) and STOI ( which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing. (C) 2016 Author(s).
引用
收藏
页码:2670 / 2679
页数:10
相关论文
共 33 条
  • [21] Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Yamamoto, Katsuhiko
    Irino, Toshio
    INTERSPEECH 2019, 2019, : 4275 - 4279
  • [22] A NEW MASK-BASED OBJECTIVE MEASURE FOR PREDICTING THE INTELLIGIBILITY OF BINARY MASKED SPEECH
    Yu, Chengzhu
    Wojcicki, Kamil K.
    Loizou, P. C.
    Hansen, John H. L.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7030 - 7033
  • [23] Predicting Speech Intelligibility Based on Across-Frequency Contrast in Simulated Auditory-Nerve Fluctuations
    Scheidiger, Christoph
    Carney, Laurel H.
    Dau, Torsten
    Zaar, Johannes
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2018, 104 (05) : 914 - 917
  • [24] Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Irino, Toshio
    INTERSPEECH 2020, 2020, : 1156 - 1160
  • [25] A deep neural network-correlation phase sensitive mask based estimation to improve speech intelligibility
    Sivapatham, Shoba
    Kar, Asutosh
    Bodile, Roshan
    Mladenovic, Vladimir
    Sooraksa, Pitikhate
    APPLIED ACOUSTICS, 2023, 212
  • [26] Predicting speech reception thresholds of cochlear implant users using a modified envelope based measure
    Montazeri, Vahid
    Hossain, Shaikat
    Assrnann, Peter F.
    SPEECH COMMUNICATION, 2017, 89 : 47 - 57
  • [27] Analysis of a simplified normalized covariance measure based on binary weighting functions for predicting the intelligibility of noise-suppressed speech
    Chen, Fei
    Loizou, Philipos C.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (06) : 3715 - 3723
  • [28] Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition
    Dong, Jing
    Zhou, Dongsheng
    Zhang, Qiang
    MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, MIWAI 2015, 2015, 9426 : 91 - 101
  • [29] Time-domain envelope modulating the noise component of excitation in a continuous residual-based vocoder for statistical parametric speech synthesis
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 434 - 438
  • [30] Discrete cosine transform-derived spectrum-based speech enhancement algorithm using temporal-domain multiband filtering
    Jeeva, Muthu Philominal Actlin
    Nagarajan, Thangavelu
    Vijayalakshmi, Parthasarathy
    IET SIGNAL PROCESSING, 2016, 10 (08) : 965 - 980