On the impact of dysarthric speech on contemporary ASR cloud platforms

被引：38

作者：

De Russis L. ^{[1
]}

Corno F. ^{[1
]}

机构：

[1] Politecnico di Torino Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi, 24, Turin

来源：

Journal of Reliable Intelligent Environments | 2019年 / 5卷 / 03期

关键词：

Accessibility; Automatic speech recognition; Cloud Platform; Comparison; Dysarthria; Speech-to-text;

D O I：

10.1007/s40860-019-00085-y

中图分类号：

学科分类号：

摘要：

The spread of voice-driven devices has a positive impact for people with disabilities in smart environments, since such devices allow them to perform a series of daily activities that were difficult or impossible before. As a result, their quality of life and autonomy increase. However, the speech recognition technology employed in such devices becomes limited with people having communication disorders, like dysarthria. People with dysarthria may be unable to control their smart environments, at least with the needed proficiency; this problem may negatively affect the perceived reliability of the entire environment. By exploiting the TORGO database of speech samples pronounced by people with dysarthria, this paper compares the accuracy of the dysarthric speech recognition as achieved by three speech recognition cloud platforms, namely IBM Watson Speech-to-Text, Google Cloud Speech, and Microsoft Azure Bing Speech. Such services, indeed, are used in many virtual assistants deployed in smart environments, such as Google Home. The goal is to investigate whether such cloud platforms are usable to recognize dysarthric speech, and to understand which of them is the most suitable for people with dysarthria. Results suggest that the three platforms have comparable performance in recognizing dysarthric speech and that the accuracy of the recognition is related to the speech intelligibility of the person. Overall, the platforms are limited when the dysarthric speech intelligibility is low (80–90% of word error rate), while they improve up to reach a word error rate of 15–25% for people without abnormality in their speech intelligibility. © 2019, Springer Nature Switzerland AG.

引用

页码：163 / 172

页数：9

共 22 条

[1]

Ballati F., Corno F., de Russis L., Assessing virtual assistant capabilities with italian dysarthric speech, Proceedings of the 20Th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’18, pp. 93-101, (2018)

[2]

Ballati F., Corno F., de Russis L., Hey siri, do you understand me?: Virtual assistants and dysarthria, Intelligent Environments 2018: Workshop Proceedings of the 14Th International Conference on Intelligent Environments, pp. 557-566, (2018)

[3]

Bigham J.P., Kushalnagar R., Huang T.H.K., Flores J.P., Savage S., On how deaf people might use speech to control devices, Proceedings of the 19Th International ACM SIGACCESS Conference on Computers and accessibility-ASSETS’17, (2017)

[4]

DeRosier R., Farber R.S., Speech recognition software as an assistive device: a pilot study of user satisfaction and psychosocial impact, Work, 25, 2, pp. 125-134, (2005)

[5]

Enderby P., Frenchay dysarthria assessment, Int J Lang Commun Disord, 15, 3, pp. 165-173, (1980)

[6]

Glasser A.T., Kushalnagar K.R., Kushalnagar R.S., Feasibility of using automatic speech recognition with voices of deaf and hard-of-hearing individuals, Proceedings of the 19Th International ACM SIGACCESS Conference on Computers and accessibility-ASSETS’17, (2017)

[7]

Cloud Speech-To-Text, (2018)

[8]

Hawley M.S., Speech recognition as an input to electronic assistive technology, Br J Occup Therap, 65, 1, pp. 15-20, (2002)

[9]

Watson Speech to Text, (2018)

[10]

Joy N.M., Umesh S., Improving acoustic models in torgo dysarthric speech database, IEEE Trans Neural Syst Rehabil Eng, 26, 3, pp. 637-645, (2018)

← 1 2 3 →