Prediction of speech intelligibility based on an auditory preprocessing model

被引：48

作者：

Christiansen, Claus ^{[1
]}

Pedersen, Michael Syskind ^{[2
]}

Dau, Torsten ^{[1
]}

机构：

[1] Tech Univ Denmark, Dept Elect Engn, Ctr Appl Hearing Res, DK-2800 Lyngby, Denmark

[2] Oticon AS, DK-2765 Smorum, Denmark

来源：

SPEECH COMMUNICATION | 2010年 / 52卷 / 7-8期

关键词：

Speech intelligibility; Auditory processing model; Ideal binary mask; Speech intelligibility index; Speech transmission index; SHORT-TERM ADAPTATION; RECEPTION THRESHOLD; TRANSMISSION INDEX; QUALITY ASSESSMENT; FLUCTUATING NOISE; ITU STANDARD; NERVE; MODULATION; MASKING; SENTENCES;

D O I：

10.1016/j.specom.2010.03.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Classical speech intelligibility models, such as the speech transmission index (STI) and the speech intelligibility index (SII) are based on calculations on the physical acoustic signals. The present study predicts speech intelligibility by combining a psychoacoustically validated model of auditory preprocessing [Dau et al., 1997. J. Acoust. Soc. Am. 102,2892-2905] with a simple central stage that describes the similarity of the test signal with the corresponding reference signal at a level of the internal representation of the signals. The model was compared with previous approaches, whereby a speech in noise experiment was used for training and an ideal binary mask experiment was used for evaluation. All three models were able to capture the trends in the speech in noise training data well, but the proposed model provides a better prediction of the binary mask test data, particularly when the binary masks degenerate to a noise vocoder. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：678 / 692

页数：15

共 53 条

[1]

[Anonymous], 1960, Experiments in Hearing

[2]

ANSI (American National Standards Institute), 1997, S351997 ANSI

[3]

BEERENDS JG, 1992, J AUDIO ENG SOC, V40, P963

[4]

Beerends JG, 2002, J AUDIO ENG SOC, V50, P765

[5] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].

Brungart, Douglas S. ;

Chang, Peter S. ;

Simpson, Brian D. ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018

[6] ESTIMATION OF MAGNITUDE-SQUARED COHERENCE FUNCTION VIA OVERLAPPED FAST FOURIER-TRANSFORM PROCESSING [J].

CARTER, GC ;

KNAPP, CH ;

NUTTALL, AH .

IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1973, AU21 (04) :337-344

[7] A quantitative model of the ''effective'' signal processing in the auditory system .2. Simulations and measurements [J].

Dau, T ;

Puschel, D ;

Kohlrausch, A .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (06) :3623-3631

[8] A quantitative model of the ''effective'' signal processing in the auditory system .1. Model structure [J].

Dau, T ;

Puschel, D ;

Kohlrausch, A .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (06) :3615-3622

[9] Modeling auditory processing of amplitude modulation .2. Spectral and temporal integration [J].

Dau, T ;

Kollmeier, B ;

Kohlrausch, A .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (05) :2906-2919

[10] Modeling auditory processing of amplitude modulation .1. Detection and masking with narrow-band carriers [J].

Dau, T ;

Kollmeier, B ;

Kohlrausch, A .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (05) :2892-2905

← 1 2 3 4 5 6 →