A Survey of Automatic Speech Recognition for Dysarthric Speech

被引:8
作者
Qian, Zhaopeng [1 ]
Xiao, Kejing [2 ]
机构
[1] Beijing Technol & Business Univ, Sch Comp & Artificial Intelligence, Beijing 100048, Peoples R China
[2] Beijing Inst Graph Commun, Sch Informat Engn, Beijing 102600, Peoples R China
关键词
dysarthric speech recognition; automatic speech recognition; acoustic model; acoustic feature extraction; lexical language model; ARTICULATORY KNOWLEDGE; SPEAKER ADAPTATION; LANGUAGE; DATABASE; PARAMETERS; DISORDERS; INTENSITY; THERAPY; MODELS; STROKE;
D O I
10.3390/electronics12204278
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dysarthric speech has several pathological characteristics, such as discontinuous pronunciation, uncontrolled volume, slow speech, explosive pronunciation, improper pauses, excessive nasal sounds, and air-flow noise during pronunciation, which differ from healthy speech. Automatic speech recognition (ASR) can be very helpful for speakers with dysarthria. Our research aims to provide a scoping review of ASR for dysarthric speech, covering papers in this field from 1990 to 2022. Our survey found that the development of research studies about the acoustic features and acoustic models of dysarthric speech is nearly synchronous. During the 2010s, deep learning technologies were widely applied to improve the performance of ASR systems. In the era of deep learning, many advanced methods (such as convolutional neural networks, deep neural networks, and recurrent neural networks) are being applied to design acoustic models and lexical and language models for dysarthric-speech-recognition tasks. Deep learning methods are also used to extract acoustic features from dysarthric speech. Additionally, this scoping review found that speaker-dependent problems seriously limit the generalization applicability of the acoustic model. The scarce available speech data cannot satisfy the amount required to train models using big data.
引用
收藏
页数:23
相关论文
共 94 条
[1]   Aphasia and dysarthria in acute stroke: recovery and functional outcome [J].
Ali, Myzoon ;
Lyden, Patrick ;
Brady, Marian .
INTERNATIONAL JOURNAL OF STROKE, 2015, 10 (03) :400-406
[2]   E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition [J].
Almadhor, Ahmad ;
Irfan, Rizwana ;
Gao, Jiechao ;
Saleem, Nasir ;
Rauf, Hafiz Tayyab ;
Kadry, Seifedine .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 222
[3]  
[Anonymous], 2012, Adv. Sci. Technol. Lett
[4]  
[Anonymous], 2006, 2006 IEEE INT C ACOU, DOI DOI 10.1109/ICASSP.2006.1660840
[5]  
[Anonymous], 2015, 6 WORKSH SPEECH LANG
[6]  
[Anonymous], 2015, P 6 WORKSH SPEECH LA
[7]  
[Anonymous], 1999, The mocha-timit articulatory database
[8]  
Beijer L.J., 2011, Potentials of Telehealth Devices for Speech Therapy in Parkinson's Disease, Diagnostics and Rehabilitation of Parkinson's Disease, P379
[9]  
Bennett Janice W, 2007, Int J Orofacial Myology, V33, P5
[10]   Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-taper Spectral Estimation [J].
Bhat, Chitralekha ;
Vachhani, Bhavik ;
Kopparapu, Sunil .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :228-232