Predicting Automatic Speech Recognition Performance over Communication Channels from Instrumental Speech Quality and Intelligibility Scores

被引：9

作者：

Gallardo, Laura Fernandez ^{[1
]}

Moeller, Sebastian ^{[1
]}

Beerends, John ^{[2
]}

机构：

[1] TU Berlin, Qual & Usabil Lab, Telekom Innovat Labs, Berlin, Germany

[2] TNO, The Hague, Netherlands

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

automatic speech recognition; speech intelligibility; instrumental speech quality; communication channels; ITU-T STANDARD; ASSESSMENT POLQA;

D O I：

10.21437/Interspeech.2017-36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility over transmission channels. Different to previous studies, the effects of super-wideband transmissions are analyzed and compared to those of wideband and narrowband channels. Furthermore, intelligibility scores. gathered by conducting a listening test based on logatomes. are also considered for the prediction of automatic speech recognition results. The modern instrumental measurement techniques POLQA and POLQA-based intelligibility have been respectively applied to estimate the quality and the intelligibility of transmitted speech. Based on our results. polynomial models are proposed that permit the prediction of speech recognition accuracy from the subjective and instrumental measures. involving a number of channel distortions in the three bandwidths. This approach can save the costs of performing automatic speech recognition experiments and can be seen as a first step towards a useful tool for communication channel designers.

引用

页码：2939 / 2943

页数：5

共 25 条

[1]

[Anonymous], 2007, WID EXT 10 REC P 862

[2]

[Anonymous], 2011, P INTERSPEECH

[3]

[Anonymous], PERC OBJ LIST QUAL A

[4] Modelling speaker intelligibility in noise [J].

Barker, Jon ;

Cooke, Martin .

SPEECH COMMUNICATION, 2007, 49 (05) :402-417

[5]

Beerends JG, 2013, J AUDIO ENG SOC, V61, P385

[6]

Beerends JG, 2013, J AUDIO ENG SOC, V61, P366

[7]

Beerends JG, 2009, J AUDIO ENG SOC, V57, P299

[8]

Fernandez Gallardo L., 2016, T LABS SERIES TELECO

[9]

Fernandez Gallardo L., 2017, ANN GERM C AC DAGA

[10]

Fernandez Gallardo L., 2015, ANN GERM C AC DAGA, P121

← 1 2 3 →