Investigation of Cross Modality Feature Fusion for Audio-Visual Dysarthric Speech Assessment

被引:0
作者
Jiang, Yicong [1 ]
Chen, Youjun [2 ]
Wang, Tianzi [2 ]
Jin, Zengrui [2 ]
Xie, Xurong [1 ]
Chen, Hui [1 ]
Liu, Xunying [2 ]
Tian, Feng [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024 | 2024年
基金
中国国家自然科学基金;
关键词
dysarthria speech; automatic assessment; modality fusion; AV-HuBERT; Wav2Vec; 2.0; INTELLIGIBILITY;
D O I
10.1109/ISCSLP63861.2024.10800618
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dysarthria, a speech disorder resulting from neurological conditions, presents significant obstacles to speech intelligibility and daily communication. Automatic dysarthria assessment has the capability to provide low-cost diagnosis and treatment assistant support for such diseases as Parkinson's disease, Alzheimer's disease, and stroke. This study investigates the efficacy of cross-modality feature fusion using audio-visual data for the automatic assessment of dysarthric speech. Leveraging advanced self-supervised learning models, AV-HuBERT and Wav2Vec 2.0, we develop a multimodal system to enhance dysarthria severity classification. Utilizing the Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) dataset, which includes synchronized audio and lip movement video recordings, our system achieves promising performance. Experimental results demonstrate that our back-end fusion and feature fusion approaches both outperform traditional single-modality methods, with the best back-end fusion system achieving a speaker-level F1 score of 0.841 while the best feature-level fusion system achieving a speaker-level F1 score of 0.772. This study marks the first application of pre-trained self-supervised learning models for multi-modal dysarthria assessment, highlighting the potential for the assistance of diagnosis and treatment.
引用
收藏
页码:141 / 145
页数:5
相关论文
共 47 条
[1]  
Baevski A, 2020, ADV NEUR IN, V33
[2]   Dysarthria treatment for Parkinson's disease: one-year follow-up of SPEAK OUT!® with the LOUD Crowd® [J].
Behrman, Alison ;
Cody, Jennifer ;
Chitnis, Shilpa ;
Elandary, Samantha .
LOGOPEDICS PHONIATRICS VOCOLOGY, 2022, 47 (04) :271-278
[3]  
Bhattacharjee T., 2023, ICASSP 2023 2023 IEE, P1
[4]   Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis [J].
Bhattacharjee, Tanuka ;
Jayakumar, Anjali ;
Belur, Yamini ;
Nalini, Atchayaram ;
Yadav, Ravi ;
Ghosh, Prasanta Kumar .
INTERSPEECH 2023, 2023, :1543-1547
[5]  
Cadet XF, 2024, Arxiv, DOI arXiv:2306.04337
[6]   Bag of Models Based Embeddings for Assessment of Neurological Disorders Using Speech Intelligibility [J].
Chandrakala, S. ;
Malini, S. ;
Jayalakshmi, S. L. .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (03) :1265-1275
[7]   DISTILHUBERT: SPEECH REPRESENTATION LEARNING BY LAYER-WISE DISTILLATION OF HIDDEN-UNIT BERT [J].
Chang, Heng-Jui ;
Yang, Shu-wen ;
Lee, Hung-yi .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7087-7091
[8]   An exploratory study of student, speech-language pathologist and emergency worker impressions of speakers with dysarthria [J].
Connaghan, Kathryn P. ;
Wertheim, Chelsea ;
Laures-Gore, Jacqueline S. ;
Russell, Scott ;
Patel, Rupal .
INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2021, 23 (03) :265-274
[9]   ALZHEIMERS-DISEASE AND PARKINSONS-DISEASE - COMPARISON OF SPEECH AND LANGUAGE ALTERATIONS [J].
CUMMINGS, JL ;
DARKINS, A ;
MENDEZ, M ;
HILL, MA ;
BENSON, DF .
NEUROLOGY, 1988, 38 (05) :680-684
[10]   Voice Alterations, Dysarthria, and Respiratory Derangements in Patients With Parkinson's Disease [J].
Di Pietro, Davide Antonio ;
Olivares, Adriana ;
Comini, Laura ;
Vezzadini, Giuliana ;
Luisa, Alberto ;
Petrolati, Anna ;
Boccola, Sara ;
Boccali, Elisa ;
Pasotti, Monica ;
Danna, Laura ;
Vitacca, Michele .
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2022, 65 (10) :3749-3757