Automatic Assessment of Chinese Dysarthria Using Audio-visual Vowel Graph Attention Network

被引:0
作者
Liu, Xiaokang [1 ,2 ]
Du, Xiaoxia [3 ]
Liu, Juan [1 ,2 ]
Su, Rongfeng [6 ]
Ng, Manwa Lawrence [4 ]
Zhang, Yumei [5 ]
Yang, Yudong [6 ]
Zhao, Shaofeng [7 ]
Wang, Lan [8 ]
Yan, Nan [8 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, ICAS Key Lab Human Machine Intelligence Synergy Sy, Shenzhen 518055, Peoples R China
[2] Univ Chinese Acad Sci, Shenzhen 518055, Peoples R China
[3] Beijing Boai Hosp, China Rehabil Res Ctr, Dept Neurorehabil, Beijing 100068, Peoples R China
[4] Univ Hong Kong, Div Speech & Hearing Sci, Hong Kong 999077, Peoples R China
[5] Capital Med Univ, Beijing Tiantan Hosp, Dept Rehabil Med, Beijing 100070, Peoples R China
[6] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[7] Sun Yat Sen Univ, Affiliated Hosp 8, Dept Rehabil Med, Shenzhen 518055, Peoples R China
[8] Chinese Acad Sci, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2025年 / 33卷
基金
中国国家自然科学基金;
关键词
Resonant frequency; Hidden Markov models; Deep learning; Visualization; Feature extraction; Resonance; Mel frequency cepstral coefficient; Tongue; Speech processing; Data mining; Dysarthria Assessment; Vowel Graph; Graph Attention Network; SPEAKER IDENTIFICATION; SPEECH; SEVERITY; INTELLIGIBILITY; MODELS; SPACE; ARTICULATION; ACOUSTICS; DISEASE;
D O I
10.1109/TASLPRO.2025.3546562
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic assessment of dysarthria remains a highly challenging task due to the high heterogeneity in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Studies have shown that expert features can effectively account for the heterogeneity of dysarthria but may lack comprehensiveness. In contrast, deep learning methods excel at uncovering latent features. Therefore, integrating the advantages of expert knowledge and deep learning to construct a neural network architecture based on expert knowledge may be beneficial for interpretability and assessment performance. In this context, the present paper proposes a vowel graph attention network based on audio-visual information, which effectively integrates the strengths of expert knowledge and deep learning. Firstly, the VGAN (Vowel Graph Attention Network) structure based on vowel space theory was designed, which has two branches to mine the information in features and the spatial correlation between vowels respectively. Secondly, a feature set based on expert knowledge and deep representation is designed. Finally, visual information was incorporated into the model to further enhance its robustness and generalizability. Tested on the Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database, this method exhibited superior performance in regression experiments targeting Frenchay scores compared to existing approaches.
引用
收藏
页码:1454 / 1466
页数:13
相关论文
共 75 条
[1]   The Detection of Dysarthria Severity Levels Using AI Models: A Review [J].
Al-Ali, Afnan ;
Al-Maadeed, Somaya ;
Saleh, Moutaz ;
Naidu, Rani Chinnappa ;
Alex, Zachariah C. ;
Ramachandran, Prakash ;
Khoodeeram, Rajeev ;
Kumar, Rajesh M. .
IEEE ACCESS, 2024, 12 :48223-48238
[2]   Evaluating the effect of Parkinson's disease on jitter and shimmer speech features [J].
Azadi, Hamid ;
Akbarzadeh-T, Mohammad-R ;
Shoeibi, Ali ;
Kobravi, Hamid .
ADVANCED BIOMEDICAL RESEARCH, 2021, 10 (01) :54
[3]  
Baevski A, 2020, ADV NEUR IN, V33
[4]  
Banks R., 2019, Int. Brain Injury Assoc., DOI [10.26226/morressier.5c7-3e1b29d813000cb41928, DOI 10.26226/MORRESSIER.5C7-3E1B29D813000CB41928]
[5]  
Belalcazar-Bolanos E. A., 2013, P IEEE S SIGN IM ART, P1, DOI [10.1109/STSIVA.2013.6644928., DOI 10.1109/STSIVA.2013.6644928]
[6]   Discriminating Between Patients With Parkinson's and Neurological Diseases Using Cepstral Analysis [J].
Benba, Achraf ;
Jilbab, Abdelilah ;
Hammouch, Ahmed .
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2016, 24 (10) :1100-1108
[7]  
Benba A, 2015, PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT 2015), P300, DOI 10.1109/EITech.2015.7163000
[8]   Automatic Assessment of Sentence-Level Dysarthria Intelligibility Using BLSTM [J].
Bhat, Chitralekha ;
Strik, Helmer .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) :322-330
[9]  
Bhat C, 2017, INT CONF ACOUST SPEE, P5070, DOI 10.1109/ICASSP.2017.7953122
[10]  
Blaney B, 2000, CLIN LINGUIST PHONET, V14, P307