Automatic speech recognition in neurodegenerative disease

被引:15
作者
Schultz, Benjamin G. [1 ,6 ]
Tarigoppula, Venkata S. Aditya [2 ,3 ]
Noffs, Gustavo [1 ]
Rojas, Sandra [1 ]
van der Walt, Anneke [4 ]
Grayden, David B. [2 ,3 ]
Vogel, Adam P. [1 ,5 ]
机构
[1] Univ Melbourne, Dept Audiol & Speech Pathol, Ctr Neurosci Speech, 550 Swanston St, Melbourne, Vic 3053, Australia
[2] Univ Melbourne, Dept Biomed Engn, Melbourne, Vic, Australia
[3] Univ Melbourne, ARC Training Ctr Cognit Comp Med Technol, Melbourne, Vic, Australia
[4] Monash Univ, Dept Neurosci, Cent Clin Sch, Melbourne, Vic, Australia
[5] Redenlab, Melbourne, Vic, Australia
[6] Maastricht Univ, Fac Psychol & Neurosci, Dept Neuropsychol & Psychopharmacol, Maastricht, Netherlands
基金
澳大利亚研究理事会;
关键词
Automatic Speech Recognition; Dysarthria; Neurodegenerative disease; Augmented assistive communication technology; Communication; FRIEDREICH ATAXIA; DYSARTHRIA; INTELLIGIBILITY; AGE;
D O I
10.1007/s10772-021-09836-w
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic speech recognition (ASR) could potentially improve communication by providing transcriptions of speech in real time. ASR is particularly useful for people with progressive disorders that lead to reduced speech intelligibility or difficulties performing motor tasks. ASR services are usually trained on healthy speech and may not be optimized for impaired speech, creating a barrier for accessing augmented assistance devices. We tested the performance of three state-of-the-art ASR platforms on two groups of people with neurodegenerative disease and healthy controls. We further examined individual differences that may explain errors in ASR services within groups, such as age and sex. Speakers were recorded while reading a standard text. Speech was elicited from individuals with multiple sclerosis, Friedreich's ataxia, and healthy controls. Recordings were manually transcribed and compared to ASR transcriptions using Amazon Web Services, Google Cloud, and IBM Watson. Accuracy was measured as the proportion of words that were correctly classified. ASR accuracy was higher for controls than clinical groups, and higher for multiple sclerosis compared to Friedreich's ataxia for all ASR services. Amazon Web Services and Google Cloud yielded higher accuracy than IBM Watson. ASR accuracy decreased with increased disease duration. Age and sex did not significantly affect ASR accuracy. ASR faces challenges for people with neuromuscular disorders. Until improvements are made in recognizing less intelligible speech, the true value of ASR for people requiring augmented assistance devices and alternative communication remains unrealized. We suggest potential methods to improve ASR for those with impaired speech.
引用
收藏
页码:771 / 779
页数:9
相关论文
共 47 条
[1]  
Apple, 2020, SIRI DEV
[2]   Recommended effect size statistics for repeated measures designs [J].
Bakeman, R .
BEHAVIOR RESEARCH METHODS, 2005, 37 (03) :379-384
[3]   Random effects structure for confirmatory hypothesis testing: Keep it maximal [J].
Barr, Dale J. ;
Levy, Roger ;
Scheepers, Christoph ;
Tily, Harry J. .
JOURNAL OF MEMORY AND LANGUAGE, 2013, 68 (03) :255-278
[4]  
Blaney B, 2000, CLIN LINGUIST PHONET, V14, P307
[5]  
Dannenberg, 2012, AUDACITY 2 0 0
[6]   On the impact of dysarthric speech on contemporary ASR cloud platforms [J].
De Russis L. ;
Corno F. .
Journal of Reliable Intelligent Environments, 2019, 5 (03) :163-172
[7]   Friedreich ataxia- pathogenesis and implications for therapies [J].
Delatycki, Martin B. ;
Bidichandani, Sanjay I. .
NEUROBIOLOGY OF DISEASE, 2019, 132
[8]   Friedreich ataxia: an overview [J].
Delatycki, MB ;
Williamson, R ;
Forrest, SM .
JOURNAL OF MEDICAL GENETICS, 2000, 37 (01) :1-8
[9]  
Dolan C. V., 2014, BAYESMED DEFAULT BAY
[10]  
FOLKER J, 2010, FOLIA PHONIATR LOGO