Dysarthria severity classification using multi-head attention and multi-task learning

被引:15
作者
Joshy, Amlu Anna [1 ,2 ]
Rajan, Rajeev [1 ,2 ,3 ]
机构
[1] APJ Abdul Kalam Technol Univ, Coll Engn Trivandrum, Thiruvananthapuram 695016, Kerala, India
[2] APJ Abdul Kalam Technol Univ, Coll Engn Trivandrum, Elect & Commun Engn Dept, Thiruvananthapuram, India
[3] Indian Inst Technol, Dept Comp Sci & Engn, Speech & Mus Technol Lab e, Madras, India
关键词
Dysarthria; Multi-head attention; Multi-task learning; Convolutional neural network; INTELLIGIBILITY ASSESSMENT; SPEECH; DISORDERS;
D O I
10.1016/j.specom.2022.12.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Identifying the severity of dysarthria is considered a diagnostic step in monitoring the patient's progress and a beneficial step in the transcription of dysarthric speech. In this paper, the effectiveness of using the multi-head attention mechanism (MHA) and the multi-task learning approach is explored for automated dysarthria severity level classification. Dysarthric speech utterances are represented by mel spectrograms and fed to a residual convolutional neural network for effective feature learning. Then the MHA module is added to identify the salient severity-highlighting periods. At the classification end, gender, age, and disorder-type identifications are employed as auxiliary tasks to share mutual information and leverage the severity classification. The performance of the proposed method is evaluated on the Universal Access Speech database. By giving a gain of 11.51% classification accuracy over the baseline system under the speaker-dependent scenario and 11.58% under the speaker-independent scenario, the proposed system demonstrates its potential for the dysarthria severity classification.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 52 条
[1]   Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks [J].
An, KwangHoon ;
Kim, Myungjong ;
Teplansky, Kristin ;
Green, Jordan R. ;
Campbell, Thomas F. ;
Yunusova, Yana ;
Heitzman, Daragh ;
Wang, Jun .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1913-1917
[2]  
[Anonymous], 1987, Speech Communications: Human and Machine
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]  
Battle D.E., 2012, Communication disorders in multicultural and international populations, V4th, DOI DOI 10.1016/C2009-0-40610-X
[5]   Automatic Assessment of Sentence-Level Dysarthria Intelligibility Using BLSTM [J].
Bhat, Chitralekha ;
Strik, Helmer .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) :322-330
[6]  
Bhat C, 2017, INT CONF ACOUST SPEE, P5070, DOI 10.1109/ICASSP.2017.7953122
[7]   Investigation of Different Time-Frequency Representations for Intelligibility Assessment of Dysarthric Speech [J].
Chandrashekar, H. M. ;
Karjigi, Veena ;
Sreedevi, N. .
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (12) :2880-2889
[8]   Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria [J].
Chandrashekar, H. M. ;
Karjigi, Veena ;
Sreedevi, N. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) :390-399
[9]  
Chandrashekar HM, 2019, 2019 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET 2019): ADVANCING WIRELESS AND MOBILE COMMUNICATIONS TECHNOLOGIES FOR 2020 INFORMATION SOCIETY, P266, DOI [10.1109/wispnet45539.2019.9032852, 10.1109/WiSPNET45539.2019.9032852]
[10]  
Crawshaw Michael, 2020, Multi-task learning with deep neural networks: A survey