E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition

被引:21
作者
Almadhor, Ahmad [1 ]
Irfan, Rizwana [2 ]
Gao, Jiechao [3 ]
Saleem, Nasir [4 ]
Rauf, Hafiz Tayyab [5 ]
Kadry, Seifedine [6 ,7 ,8 ]
机构
[1] Jouf Univ, Coll Comp & Informat Sci, Dept Comp Engn & Networks, Sakakah, Saudi Arabia
[2] Univ Jeddah, Coll Comp & Informat Technol Khulais, Dept Informat Technol, Jeddah 21959, Saudi Arabia
[3] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22904 USA
[4] Gomal Univ, Dept Elect Engn, FET, Dera Ismail Khan, Pakistan
[5] Staffordshire Univ, Ctr Smart Syst AI & Cybersecur, Stoke On Trent ST4 2DE, England
[6] Noroff Univ Coll, Dept Appl Data Sci, N-4612 Kristiansand, Norway
[7] Ajman Univ, Artificial Intelligence Res Ctr AIRC, POB 346, Ajman, U Arab Emirates
[8] Lebanese Amer Univ, Dept Elect & Comp Engn, POB 13, Byblos 5053, Lebanon
关键词
Dysarthria; Dysarthric ASR; Speech intelligibility; Words error; Multi-head transformer; CNN; FEATURES; SYSTEM;
D O I
10.1016/j.eswa.2023.119797
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dysarthria is a motor speech disability caused by weak muscles and organs involved in the articulation process, thereby affecting the speech intelligibility of individuals. Because this condition is linked to physical exhaustion disabilities, individuals not only have communication difficulties, but also have difficulty interacting with digital devices. Automatic speech recognition (ASR) makes an important difference for individuals with dysarthria since modern digital devices offer a better interaction medium that enables them to interact with their community and computers. Still, the performance of ASR technologies is poor in recognizing dysarthric speech, particularly for acute dysarthria. Multiple challenges, including dysarthric phoneme inaccuracy and labeling imperfection, are facing dysarthric ASR technologies. This paper proposes a spatio-temporal dysarthric ASR (DASR) system using Spatial Convolutional Neural Network (SCNN) and Multi-Head Attention Transformer (MHAT) to visually extract the speech features, and DASR learns the shapes of phonemes pronounced by dysarthric individuals. This visual DASR feature modeling eliminates phoneme-related challenges. The UA-Speech database is utilized in this paper, including different speakers with different speech intelligibility levels. However, because the proportion of us-able speech data to the number of distinctive classes in the UA-speech database was small, the proposed DASR system leverages transfer learning to generate synthetic leverage and visuals. In benchmarking with other DASRs examined in this study, the proposed DASR system outperformed and improved the recognition accuracy for 20.72% of the UA-Speech database. The largest improvements were achieved for very-low (25.75%) and low intelligibility (33.67%).
引用
收藏
页数:12
相关论文
共 34 条
  • [1] Automatic Assessment of Sentence-Level Dysarthria Intelligibility Using BLSTM
    Bhat, Chitralekha
    Strik, Helmer
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) : 322 - 330
  • [2] Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech
    Calvo, Irene
    Tropea, Peppino
    Vigano, Mauro
    Scialla, Maria
    Cavalcante, Agnieszka B.
    Grajzer, Monika
    Gilardone, Marco
    Corbo, Massimo
    [J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 2021, 73 (05) : 432 - 441
  • [3] A Weighted Speaker-Specific Confusion Transducer-Based Augmentative and Alternative Speech Communication Aid for Dysarthric Speakers
    Celin, T. A. Mariya
    Rachel, G. Anushiya
    Nagarajan, T.
    Vijayalakshmi, P.
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2019, 27 (02) : 187 - 197
  • [4] Representation Learning Based Speech Assistive System for Persons With Dysarthria
    Chandrakala, S.
    Rajeswari, Natarajan
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (09) : 1510 - 1517
  • [5] Investigation of Different Time-Frequency Representations for Intelligibility Assessment of Dysarthric Speech
    Chandrashekar, H. M.
    Karjigi, Veena
    Sreedevi, N.
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (12) : 2880 - 2889
  • [6] Christensen H, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P1774
  • [7] Speech recognition with artificial neural networks
    Dede, Guelin
    Sazli, Murat Huesnue
    [J]. DIGITAL SIGNAL PROCESSING, 2010, 20 (03) : 763 - 768
  • [8] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
  • [9] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [10] Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer
    Gonzalvo, Xavi
    Tazari, Siamak
    Chan, Chun-an
    Becker, Markus
    Gutkin, Alexander
    Silen, Hanna
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2238 - 2242