Estimation of Speech Intelligibility Using Speech Recognition Systems

被引：1

作者：

Takano, Yusuke ^{[1
]}

Kondo, Kazuhiro ^{[1
]}

机构：

[1] Yamagata Univ, Grad Sch Sci & Engn, Yonezawa, Yamagata 9928510, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 12期

关键词：

objective estimation; speech intelligibility; speech recognition; Japanese Diagnostic Rhyme Test; noise adaptation;

D O I：

10.1587/transinf.E93.D.3368

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We attempted to estimate subjective scores of the Japanese Diagnostic Rhyme Test (DRT) a two to one forced selection speech Intelligibility test We used automatic speech recognizers with language models that force one of the words in the word pair mimicking the human recognition process of the DRT Initial testing was done using speaker Independent models and they showed significantly lower scores than subjective scores The acoustic models were then adapted to each of the speakers in the corpus and then adapted to noise at a specified SNR Three different types of noise were tested white noise multi talker (babble) noise and pseudo speech noise The match between subjective and estimated scores improved significantly with noise adapted models compared to speaker independent models and the speaker adapted models when the adapted noise level and the tested level match However when SNR conditions do not match the recognition scores degraded especially when tested SNR conditions were higher than the adapted noise level Accordingly we adapted the models to mixed levels of noise i e multi condition training The adapted models now showed relatively high intelligibility matching subjective intelligibility performance over all levels of noise The correlation between subjective and estimated intelligibility scores increased to 0 94 with multi talker noise 0 93 with white noise and 0 89 with pseudo speech noise while the root mean square error (RMSE) reduced from more than 40 to 13 10 13 05 and 16 06 respectively

引用

页码：3368 / 3376

页数：9

共 13 条

[1]

ANSI, 1989, ANSIS321989, VS3, P2

[2] Modelling speaker intelligibility in noise [J].

Barker, Jon ;

Cooke, Martin .

SPEECH COMMUNICATION, 2007, 49 (05) :402-417

[3]

Chen G, 2005, INT CONF ACOUST SPEE, P385

[4]

Fujimori M, 2006, P INT S FRONT SPEECH

[5]

Hori T., 1997, Transactions of the Institute of Electronics, Information and Communication Engineers D-II, VJ80D-II, P2645

[6]

Kondo K, 2010, P ISCA INT SEPT

[7]

Kondo K, 2001, P 17 INT C AC 7P39 R

[8]

Kondo K., 2007, J ACOUSTICAL SOC JAP, V63, P196

[9] MAXIMUM-LIKELIHOOD LINEAR-REGRESSION FOR SPEAKER ADAPTATION OF CONTINUOUS DENSITY HIDDEN MARKOV-MODELS [J].

LEGGETTER, CJ ;

WOODLAND, PC .

COMPUTER SPEECH AND LANGUAGE, 1995, 9 (02) :171-185

[10]

Maekawa K., 2000, P LREC, V6, P1

← 1 2 →