Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection

被引:0
作者
Xiang, Yan [1 ]
Berisha, Visar [1 ,2 ]
Liss, Julie [2 ]
Chakrabarti, Chaitali [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
[2] Arizona State Univ, Coll Hlth Solut, Tempe, AZ USA
来源
INTERSPEECH 2024 | 2024年
关键词
Dysarthria detection; speech processing; deep neural network; multi-task learning; DISEASE;
D O I
10.21437/Interspeech.2024-1563
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech analytic models based on deep learning are popular in clinical diagnostics. However, constraints on clinical data collection and sharing place limits on available dataset sizes, which adversely impacts trained model performance. Multi-task learning (MTL) has been utilized to mitigate the effect of limited sample size by jointly training on multiple tasks that are considered to be related. However, discrepancies between clinical and non-clinical tasks can reduce MTL efficiency and can even cause it to fail, especially when there are gradient conflicts. In this paper, we enhance the performance of dysarthria detection by using MTL with an auxiliary task of learning speaker embeddings. We propose a task-specific gradient projection method to overcome gradient conflicts. Our evaluation shows that the proposed MTL paradigm outperforms both single-task learning and conventional MTL under different data availability settings.
引用
收藏
页码:902 / 906
页数:5
相关论文
共 50 条
  • [31] Adaptive multi-task learning for speech to text translation
    Feng, Xin
    Zhao, Yue
    Zong, Wei
    Xu, Xiaona
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [32] Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
    Wen, Zhengqi
    Li, Kehuang
    Huang, Zhen
    Lee, Chin-Hui
    Tao, Jianhua
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1025 - 1037
  • [33] Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning
    Bhattacharjee, Mrinmoy
    Prasanna, S. R. M.
    Guha, Prithwijit
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1 - 10
  • [34] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [35] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    Journal of Signal Processing Systems, 2021, 93 : 299 - 308
  • [36] IMPROVING SAR TARGET RECOGNITION WITH MULTI-TASK LEARNING
    Du, Wenrui
    Zhang, Fan
    Ma, Fei
    Yin, Qiang
    Zhou, Yongsheng
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 284 - 287
  • [37] Optimization of Action Recognition Model Based on Multi-Task Learning and Boundary Gradient
    Xu, Yiming
    Zhou, Fangjie
    Wang, Li
    Peng, Wei
    Zhang, Kai
    ELECTRONICS, 2021, 10 (19)
  • [38] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
  • [39] Multi-task learning for video anomaly detection*
    Chang, Xingya
    Zhang, Yuxin
    Xue, Dingyu
    Chen, Dongyue
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
  • [40] MULTI-TASK LEARNING FOR VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Clark, Pascal
    Haynes, Rob
    Richards, Hywel
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7449 - 7453