Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection

被引：0

作者：

Xiang, Yan ^{[1
]}

Berisha, Visar ^{[1
,2
]}

Liss, Julie ^{[2
]}

Chakrabarti, Chaitali ^{[1
]}

机构：

[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA

[2] Arizona State Univ, Coll Hlth Solut, Tempe, AZ USA

来源：

INTERSPEECH 2024 | 2024年

关键词：

Dysarthria detection; speech processing; deep neural network; multi-task learning; DISEASE;

D O I：

10.21437/Interspeech.2024-1563

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech analytic models based on deep learning are popular in clinical diagnostics. However, constraints on clinical data collection and sharing place limits on available dataset sizes, which adversely impacts trained model performance. Multi-task learning (MTL) has been utilized to mitigate the effect of limited sample size by jointly training on multiple tasks that are considered to be related. However, discrepancies between clinical and non-clinical tasks can reduce MTL efficiency and can even cause it to fail, especially when there are gradient conflicts. In this paper, we enhance the performance of dysarthria detection by using MTL with an auxiliary task of learning speaker embeddings. We propose a task-specific gradient projection method to overcome gradient conflicts. Our evaluation shows that the proposed MTL paradigm outperforms both single-task learning and conventional MTL under different data availability settings.

引用

页码：902 / 906

页数：5

共 50 条

[31] Adaptive multi-task learning for speech to text translation
Feng, Xin
Zhao, Yue
Zong, Wei
Xu, Xiaona
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
[32] Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
Wen, Zhengqi
Li, Kehuang
Huang, Zhen
Lee, Chin-Hui
Tao, Jianhua
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1025 - 1037
[33] Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning
Bhattacharjee, Mrinmoy
Prasanna, S. R. M.
Guha, Prithwijit
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1 - 10
[34] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
Zhang, Yu
Zhang, Pengyuan
Yan, Yonghong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
[35] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Zhao Huijuan
Ye Ning
Wang Ruchuan
Journal of Signal Processing Systems, 2021, 93 : 299 - 308
[36] IMPROVING SAR TARGET RECOGNITION WITH MULTI-TASK LEARNING
Du, Wenrui
Zhang, Fan
Ma, Fei
Yin, Qiang
Zhou, Yongsheng
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 284 - 287
[37] Optimization of Action Recognition Model Based on Multi-Task Learning and Boundary Gradient
Xu, Yiming
Zhou, Fangjie
Wang, Li
Peng, Wei
Zhang, Kai
ELECTRONICS, 2021, 10 (19)
[38] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Zhao, Huijuan
Ye, Ning
Wang, Ruchuan
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
[39] Multi-task learning for video anomaly detection*
Chang, Xingya
Zhang, Yuxin
Xue, Dingyu
Chen, Dongyue
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
[40] MULTI-TASK LEARNING FOR VOICE TRIGGER DETECTION
Sigtia, Siddharth
Clark, Pascal
Haynes, Rob
Richards, Hywel
Bridle, John
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7449 - 7453

← 1 2 3 4 5 →