Automatic Assessment of Chinese Dysarthria Using Audio-visual Vowel Graph Attention Network

被引:0
作者
Liu, Xiaokang [1 ,2 ]
Du, Xiaoxia [3 ]
Liu, Juan [1 ,2 ]
Su, Rongfeng [6 ]
Ng, Manwa Lawrence [4 ]
Zhang, Yumei [5 ]
Yang, Yudong [6 ]
Zhao, Shaofeng [7 ]
Wang, Lan [8 ]
Yan, Nan [8 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, ICAS Key Lab Human Machine Intelligence Synergy Sy, Shenzhen 518055, Peoples R China
[2] Univ Chinese Acad Sci, Shenzhen 518055, Peoples R China
[3] Beijing Boai Hosp, China Rehabil Res Ctr, Dept Neurorehabil, Beijing 100068, Peoples R China
[4] Univ Hong Kong, Div Speech & Hearing Sci, Hong Kong 999077, Peoples R China
[5] Capital Med Univ, Beijing Tiantan Hosp, Dept Rehabil Med, Beijing 100070, Peoples R China
[6] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[7] Sun Yat Sen Univ, Affiliated Hosp 8, Dept Rehabil Med, Shenzhen 518055, Peoples R China
[8] Chinese Acad Sci, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2025年 / 33卷
基金
中国国家自然科学基金;
关键词
Resonant frequency; Hidden Markov models; Deep learning; Visualization; Feature extraction; Resonance; Mel frequency cepstral coefficient; Tongue; Speech processing; Data mining; Dysarthria Assessment; Vowel Graph; Graph Attention Network; SPEAKER IDENTIFICATION; SPEECH; SEVERITY; INTELLIGIBILITY; MODELS; SPACE; ARTICULATION; ACOUSTICS; DISEASE;
D O I
10.1109/TASLPRO.2025.3546562
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic assessment of dysarthria remains a highly challenging task due to the high heterogeneity in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Studies have shown that expert features can effectively account for the heterogeneity of dysarthria but may lack comprehensiveness. In contrast, deep learning methods excel at uncovering latent features. Therefore, integrating the advantages of expert knowledge and deep learning to construct a neural network architecture based on expert knowledge may be beneficial for interpretability and assessment performance. In this context, the present paper proposes a vowel graph attention network based on audio-visual information, which effectively integrates the strengths of expert knowledge and deep learning. Firstly, the VGAN (Vowel Graph Attention Network) structure based on vowel space theory was designed, which has two branches to mine the information in features and the spatial correlation between vowels respectively. Secondly, a feature set based on expert knowledge and deep representation is designed. Finally, visual information was incorporated into the model to further enhance its robustness and generalizability. Tested on the Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database, this method exhibited superior performance in regression experiments targeting Frenchay scores compared to existing approaches.
引用
收藏
页码:1454 / 1466
页数:13
相关论文
共 75 条
[41]  
Kim M. J., 2012, P 13 ANN C INT SPEEC
[42]   Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model [J].
Kim, Myung Jong ;
Kim, Younggwan ;
Kim, Hoirin .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) :694-704
[43]  
Kim M, 2018, INTERSPEECH, P2948
[44]   Statistical Models of F2 Slope in Relation to Severity of Dysarthria [J].
Kim, Yunjung ;
Weismer, Gary ;
Kent, Raymond D. ;
Duffy, Joseph R. .
FOLIA PHONIATRICA ET LOGOPAEDICA, 2009, 61 (06) :329-335
[45]   Vowel Acoustics in Dysarthria: Mapping to Perception [J].
Lansford, Kaitlin L. ;
Liss, Julie M. .
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2014, 57 (01) :68-80
[46]   Vowel Acoustics in Dysarthria: Speech Disorder Diagnosis and Classification [J].
Lansford, Kaitlin L. ;
Liss, Julie M. .
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2014, 57 (01) :57-67
[47]   INAPPROPRIATE PAUSE DETECTION IN DYSARTHRIC SPEECH USING LARGE-SCALE SPEECH RECOGNITION [J].
Lee, Jeehyun ;
Choi, Yerin ;
Song, Tae-Jin ;
Koo, Myoung-Wan .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :12486-12490
[48]  
LeGendre S., 2009, The Journal of the Acoustical Society of America, V125, P2530, DOI DOI 10.1121/1.4783544
[49]   Audio-video database from subacute stroke patients for dysarthric speech intelligence assessment and preliminary analysis [J].
Liu, Juan ;
Du, Xiaoxia ;
Lu, Shangjun ;
Zhang, Yu-Mei ;
An-ming, H. U. ;
Ng, Manwa Lawrence ;
Su, Rongfeng ;
Wang, Lan ;
Yan, Nan .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 79
[50]   Acoustical Assessment of Voice Disorder With Continuous Speech Using ASR Posterior Features [J].
Liu, Yuanyuan ;
Lee, Tan ;
Law, Thomas ;
Lee, Kathy Yuet-Sheung .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (06) :1047-1059