Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain

被引:42
|
作者
Relano-Iborra, Helia [1 ]
May, Tobias [1 ]
Zaar, Johannes [1 ]
Scheidiger, Christoph [1 ]
Dau, Torsten [1 ]
机构
[1] Tech Univ Denmark, Dept Elect Engn, Hearing Syst Grp, DK-2800 Lyngby, Denmark
关键词
RECEPTION THRESHOLD; AMPLITUDE-MODULATION; TRANSMISSION INDEX; FREQUENCY; NOISE; MASKING; MODEL;
D O I
10.1121/1.4964505
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [ mr-sEPSM; Jorgensen, Ewert, and Dau ( 2013). J. Acoust. Soc. Am. 134( 1), 436-446] with a correlation back end inspired by the short-time objective intelligibility measure [ STOI; Taal, Hendriks, Heusdens, and Jensen ( 2011). IEEE Trans. Audio Speech Lang. Process. 19( 7), 2125-2136]. This "hybrid" model, named sEPSM(corr), is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation ( ITFS). The model shows a broader predictive range than both the original mr-sEPSM ( which fails in the phase-jitter and ITFS conditions) and STOI ( which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing. (C) 2016 Author(s).
引用
收藏
页码:2670 / 2679
页数:10
相关论文
共 33 条
  • [31] Speech Analysis Method Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Log-Spectrum Domain
    Boonkla, Surasak
    Unoki, Masashi
    Makhanov, Stanislav S.
    Wutiwiwatchai, Chai
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 555 - +
  • [32] Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing
    Yadava, Thimmaraja G.
    Jayanna, H. S.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 639 - 648
  • [33] Cross-Correlation-Based, Phase-Domain Spectrum Sensing With Low-Cost Software-Defined Radio Receivers
    Kitsunezuka, Masaki
    Pister, Kristofer S. J.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (08) : 2033 - 2048