Automatic Assessment of Chinese Dysarthria Using Audio-visual Vowel Graph Attention Network

被引:0
作者
Liu, Xiaokang [1 ,2 ]
Du, Xiaoxia [3 ]
Liu, Juan [1 ,2 ]
Su, Rongfeng [6 ]
Ng, Manwa Lawrence [4 ]
Zhang, Yumei [5 ]
Yang, Yudong [6 ]
Zhao, Shaofeng [7 ]
Wang, Lan [8 ]
Yan, Nan [8 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, ICAS Key Lab Human Machine Intelligence Synergy Sy, Shenzhen 518055, Peoples R China
[2] Univ Chinese Acad Sci, Shenzhen 518055, Peoples R China
[3] Beijing Boai Hosp, China Rehabil Res Ctr, Dept Neurorehabil, Beijing 100068, Peoples R China
[4] Univ Hong Kong, Div Speech & Hearing Sci, Hong Kong 999077, Peoples R China
[5] Capital Med Univ, Beijing Tiantan Hosp, Dept Rehabil Med, Beijing 100070, Peoples R China
[6] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[7] Sun Yat Sen Univ, Affiliated Hosp 8, Dept Rehabil Med, Shenzhen 518055, Peoples R China
[8] Chinese Acad Sci, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2025年 / 33卷
基金
中国国家自然科学基金;
关键词
Resonant frequency; Hidden Markov models; Deep learning; Visualization; Feature extraction; Resonance; Mel frequency cepstral coefficient; Tongue; Speech processing; Data mining; Dysarthria Assessment; Vowel Graph; Graph Attention Network; SPEAKER IDENTIFICATION; SPEECH; SEVERITY; INTELLIGIBILITY; MODELS; SPACE; ARTICULATION; ACOUSTICS; DISEASE;
D O I
10.1109/TASLPRO.2025.3546562
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic assessment of dysarthria remains a highly challenging task due to the high heterogeneity in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Studies have shown that expert features can effectively account for the heterogeneity of dysarthria but may lack comprehensiveness. In contrast, deep learning methods excel at uncovering latent features. Therefore, integrating the advantages of expert knowledge and deep learning to construct a neural network architecture based on expert knowledge may be beneficial for interpretability and assessment performance. In this context, the present paper proposes a vowel graph attention network based on audio-visual information, which effectively integrates the strengths of expert knowledge and deep learning. Firstly, the VGAN (Vowel Graph Attention Network) structure based on vowel space theory was designed, which has two branches to mine the information in features and the spatial correlation between vowels respectively. Secondly, a feature set based on expert knowledge and deep representation is designed. Finally, visual information was incorporated into the model to further enhance its robustness and generalizability. Tested on the Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database, this method exhibited superior performance in regression experiments targeting Frenchay scores compared to existing approaches.
引用
收藏
页码:1454 / 1466
页数:13
相关论文
共 75 条
[61]   Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech [J].
Radha, Kodali ;
Bansal, Mohan ;
Dulipalla, Venkata Rao .
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (05) :3261-3278
[62]   Vowel production: a potential speech biomarker for early detection of dysarthria in Parkinson's disease [J].
Roland, Virginie ;
Huet, Kathy ;
Harmegnies, Bernard ;
Piccaluga, Myriam ;
Verhaegen, Clemence ;
Delvaux, Veronique .
FRONTIERS IN PSYCHOLOGY, 2023, 14
[63]   Predicting Speech Intelligibility Decline in Amyotrophic Lateral Sclerosis Based on the Deterioration of Individual Speech Subsystems [J].
Rong, Panying ;
Yunusova, Yana ;
Wang, Jun ;
Zinman, Lorne ;
Pattee, Gary L. ;
Berry, James D. ;
Perry, Bridget ;
Green, Jordan R. .
PLOS ONE, 2016, 11 (05)
[64]   Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech [J].
Sapir, Shimon ;
Ramig, Lorraine O. ;
Spielman, Jennifer L. ;
Fox, Cynthia .
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2010, 53 (01) :114-125
[65]   Vowel Articulation in Parkinson's Disease [J].
Skodda, Sabine ;
Visser, Wenke ;
Schlegel, Uwe .
JOURNAL OF VOICE, 2011, 25 (04) :467-472
[66]   Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning [J].
Tong, Han ;
Sharifzadeh, Hamid ;
McLoughlin, Ian .
INTERSPEECH 2020, 2020, :4786-4790
[67]   A Closer Look at Spatiotemporal Convolutions for Action Recognition [J].
Tran, Du ;
Wang, Heng ;
Torresani, Lorenzo ;
Ray, Jamie ;
LeCun, Yann ;
Paluri, Manohar .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6450-6459
[68]   Cognitive Tests to Detect Dementia A Systematic Review and Meta-analysis [J].
Tsoi, Kelvin K. F. ;
Chan, Joyce Y. C. ;
Hirai, Hoyee W. ;
Wong, Samuel Y. S. ;
Kwok, Timothy C. Y. .
JAMA INTERNAL MEDICINE, 2015, 175 (09) :1450-1458
[69]  
Vachhani B, 2018, INTERSPEECH, P471
[70]  
Vásquez-Correa JC, 2018, INTERSPEECH, P456