On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility

被引:4
作者
Hossain, Md Ekramul [1 ,2 ]
Zilany, Muhammad S. A. [2 ,3 ]
Davies-Venn, Evelyn [4 ]
机构
[1] Univ Sydney, Complex Syst Res Grp, Fac Engn & IT, Sydney, NSW 2006, Australia
[2] Univ Malaya, Dept Biomed Engn, Kuala Lumpur 50603, Malaysia
[3] Texas A&M Univ Qatar, Elect & Comp Engn Program, Doha 23874, Qatar
[4] Univ Minnesota, Dept Speech Language & Hearing Sci, Minneapolis, MN 55455 USA
关键词
Speech intelligibility; Spectrogram; Higher order statistics; Bispectrum; RECEPTION THRESHOLD; FLUCTUATING NOISE; INDEX; PERCEPTION; MODEL; RECOGNITION; RESPONSES; MASKING; SIGNALS; QUALITY;
D O I
10.1016/j.csl.2019.02.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of background noise or nonlinear distortions encountered in real-world situations often reduces the intelligibility of speech signals. Several objective measurements and prediction procedures have been developed to assess speech intelligibility in noise. Most of the existing measures are, however, suitable for only a subset of specified forms of distortion. This study developed a reliable, reference-free speech intelligibility metric that uses the properties of an acoustic signal to predict the effects of a wide range of distortions that influence speech intelligibility in quiet and noisy conditions. The bispectral speech intelligibility metric (BSIM), was developed by extracting the features from the spectrogram of speech signals using the third-order statistics, which are collectively known as the bispectrum. Speech intelligibility scores predicted by the BSIM were compared to behavioral speech intelligibility scores in quiet and noise. The performance of the BSIM was also compared with that of several widely used speech intelligibility metrics. Results showed that the BSIM can successfully predict nonlinear distortions, such as peak-clipping and center-clipping, as well as time domain distortions, such as phase-jitter and reverberation. Unlike existing metrics, such as the articulation index and speech transmission index, the BSIM successfully captured the effect of fluctuating noise on speech intelligibility and predicted the effects of the degradation of noisy speech processed by the ideal time-frequency segregation method. The BSIM presents a reliable, reference-free, and objective measure of speech intelligibility that can provide real-time predictions of the effect of signal processing and acoustics distortion on speech intelligibility in quiet and noise. In addition, the BSIM could be used to analyze algorithms that process noisy speech. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:59 / 80
页数:22
相关论文
共 62 条
  • [1] [Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
  • [2] [Anonymous], VERSION
  • [3] [Anonymous], P INTERSPEECH
  • [4] [Anonymous], 1993, LINGUISTIC DATA CONS, DOI DOI 10.35111/17GK-BN40
  • [5] ANSI, 1997, S3 5 1997 METH CALC, P90
  • [6] Why Do Hearing-Impaired Listeners Fail to Benefit from Masker Fluctuations?
    Bernstein, Joshua G. W.
    [J]. NEUROPHYSIOLOGICAL BASES OF AUDITORY PERCEPTION, 2010, : 609 - 619
  • [7] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) : 4007 - 4018
  • [8] Carhart R., 1966, DTIC DOCUMENT
  • [9] Third-order cumulant RLS algorithm for nonminimum ARMA systems identification
    Chow, TWS
    Tan, HZ
    Fei, G
    [J]. SIGNAL PROCESSING, 1997, 61 (01) : 23 - 38
  • [10] Cardiac state diagnosis using higher order spectra of heart rate variability
    Division of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore 590011, Singapore
    不详
    [J]. J. Med. Eng. Technol., 2008, 2 (145-155): : 145 - 155