BLIND ESTIMATION OF THE SPEECH TRANSMISSION INDEX FOR SPEECH QUALITY PREDICTION

被引：0

作者：

Seetharaman, Prem ^{[1
,2
]}

Mysore, Gautham J. ^{[2
]}

Smaragdis, Paris ^{[2
,3
]}

Pardo, Bryan ^{[1
]}

机构：

[1] Northwestern Univ, Evanston, IL 60208 USA

[2] Adobe Res, San Francisco, CA 94103 USA

[3] Univ Illinois, Champaign, IL 61820 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

美国国家科学基金会;

关键词：

Speech quality; speech enhancement; speech transmission index; REVERBERANT; DECAY;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The speech transmission index (STI) of a listening position within a given room indicates the quality and intelligibility of speech uttered in that room. The measure is very reliable for predicting speech intelligibility in many room conditions but requires an STI measurement of the impulse response for the room. We present a method for blindly estimating the STI without measuring or modeling the impulse response of the room using deep convolutional neural networks. Our model is trained entirely using simulated room impulse responses combined with clean speech examples from the DAPS dataset [1] and works directly on PCM audio. Our experiments show that our method predicts true STI with a high degree of accuracy - an average error of under 4%. It can also distinguish between different STI conditions to a level of granularity that is comparable to humans.

引用

页码：591 / 595

页数：5

共 20 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[2]

[Anonymous], PRESENT FUTURE SPEEC

[3]

[Anonymous], 2013, EUR SIGN PROC C

[4]

[Anonymous], IEEE WORKSH APPL SIG

[5] A just noticeable difference in C50 for speech [J].

Bradley, JS ;

Reich, R ;

Norcross, SG .

APPLIED ACOUSTICS, 1999, 58 (02) :99-108

[6] A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech [J].

Falk, Tiago H. ;

Zheng, Chenxi ;

Chan, Wai-Yip .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1766-1774

[7]

HOUTGAST T, 1973, ACUSTICA, V28, P66

[8]

Kingma D.P., 2014, INT C LEARN REP

[9] A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research [J].

Kinoshita, Keisuke ;

Delcroix, Marc ;

Gannot, Sharon ;

Habets, Emanuel A. P. ;

Haeb-Umbach, Reinhold ;

Kellermann, Walter ;

Leutnant, Volker ;

Maas, Roland ;

Nakatani, Tomohiro ;

Raj, Bhiksha ;

Sehr, Armin ;

Yoshioka, Takuya .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, :1-19

[10] Can we Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?-A Dataset, Insights, and Challenges [J].

Mysore, Gautham J. .

IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (08) :1006-1010

← 1 2 →