Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

被引:5
作者
Radha, Kodali [1 ]
Bansal, Mohan [2 ]
Dulipalla, Venkata Rao [1 ]
机构
[1] Velagapudi Ramakrishna Siddhartha Engn Coll, Dept Elect & Commun Engn, Kanuru 520007, Andhra Pradesh, India
[2] Indian Inst Informat Technol Sonepat, Elect & Commun Engn, IITD Techno Pk, Sonipat 131001, Haryana, India
关键词
Dysarthria severity level assessment; STFT layered CNN; Pre-emphasis filtering; TORGO dataset; UA-Speech dataset; INTELLIGIBILITY ASSESSMENT;
D O I
10.1007/s00034-024-02611-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a novel approach for automated dysarthria detection and severity assessment using a variable short-time Fourier transform layered convolutional neural networks (CNN) model. Dysarthria is a speech disorder characterized by difficulties in articulation, resulting in unclear speech. The model is evaluated on two datasets, TORGO and UA-Speech, consisting of individuals with dysarthria and healthy controls. Various variations of the CNN's first layer, including spectrogram, log spectrogram, and pre-emphasis filtering (PEF) with and without learnables, are investigated. Notably, the PEF with 5 learnables achieves the highest accuracy in detecting dysarthria and assessing its severity. The study highlights the significance of dataset size, with UA-Speech dataset showing superior performance due to its larger size, enabling better capture of dysarthria severity variations. This research contributes to the advancement of objective dysarthria assessment, aiding in early diagnosis and personalized treatment for individuals with speech disorders.
引用
收藏
页码:3261 / 3278
页数:18
相关论文
共 39 条
[1]   Automatic Assessment of Sentence-Level Dysarthria Intelligibility Using BLSTM [J].
Bhat, Chitralekha ;
Strik, Helmer .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) :322-330
[2]  
Bhat C, 2017, INT CONF ACOUST SPEE, P5070, DOI 10.1109/ICASSP.2017.7953122
[3]   Speech treatment for Hebrew-speaking adolescents and young adults with developmental dysarthria: A comparison of mSIT and Beatalk [J].
Carl, Micalle ;
Levy, Erika S. ;
Icht, Michal .
INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS, 2022, 57 (03) :660-679
[4]   Investigation of Different Time-Frequency Representations for Intelligibility Assessment of Dysarthric Speech [J].
Chandrashekar, H. M. ;
Karjigi, Veena ;
Sreedevi, N. .
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (12) :2880-2889
[5]   Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria [J].
Chandrashekar, H. M. ;
Karjigi, Veena ;
Sreedevi, N. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) :390-399
[6]  
Enderby Pam, 2013, Handb Clin Neurol, V110, P273, DOI 10.1016/B978-0-444-52901-5.00022-8
[7]   Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features [J].
Fritsch, Julian ;
Magimai-Doss, Mathew .
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 :224-228
[8]   On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification [J].
Gallardo-Antolin, Ascension ;
Montero, Juan M. .
NEUROCOMPUTING, 2021, 456 :49-60
[9]   Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments [J].
Gupta, Siddhant ;
Patil, Ankur T. ;
Purohit, Mirali ;
Parmar, Mihir ;
Patel, Maitreya ;
Patil, Hemant A. ;
Guido, Rodrigo Capobianco .
NEURAL NETWORKS, 2021, 139 :105-117
[10]   Prosody-Based Measures for Automatic Severity Assessment of Dysarthric Speech [J].
Hernandez, Abner ;
Kim, Sunhee ;
Chung, Minhwa .
APPLIED SCIENCES-BASEL, 2020, 10 (19) :1-18