Scalogram based performance comparison of deep learning architectures for dysarthric speech detection

被引:0
作者
Shabber, Shaik Mulla [1 ]
Sumesh, E. P. [1 ]
Ramachandran, Vidhya Lavanya [2 ]
机构
[1] VIT AP Univ, Sch Elect Engn, Amaravati 522237, Andhra Pradesh, India
[2] Middle East Coll, Dept Comp & Elect Engn, Knowledge Oasis, Muscat 124, Oman
关键词
Dysartrhic speech; Deep learning; Scalogram; Wavelet transformation; CNN; CLASSIFICATION; SEVERITY;
D O I
10.1007/s10462-024-11085-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dysarthria, a speech disorder commonly associated with neurological conditions, poses challenges in early detection and accurate diagnosis. This study addresses these challenges by implementing preprocessing steps, such as noise reduction and normalization, to enhance the quality of raw speech signals and extract relevant features. Scalogram images generated through wavelet transform effectively capture the time-frequency characteristics of the speech signal, offering a visual representation of the spectral content over time and providing valuable insights into speech abnormalities related to dysarthria. Fine-tuned deep learning models, including pre-trained convolutional neural network (CNN) architectures like VGG19, DenseNet-121, Xception, and a modified InceptionV3, were optimized with specific hyperparameters using training and validation sets. Transfer learning enables these models to adapt features from general image classification tasks to classify dysarthric speech signals better. The study evaluates the models using two public datasets TORGO and UA-Speech and a third dataset collected by the authors and verified by medical practitioners. The results reveal that the CNN models achieve an accuracy (acc) range of 90% to 99%, an F1-score range of 0.95 to 0.99, and a recall range of 0.96 to 0.99, outperforming traditional methods in dysarthria detection. These findings highlight the effectiveness of the proposed approach, leveraging deep learning and scalogram images to advance early diagnosis and healthcare outcomes for individuals with dysarthria.
引用
收藏
页数:27
相关论文
共 36 条
[1]   Classification of Dysarthric Speech According to the Severity of Impairment: an Analysis of Acoustic Features [J].
Al-Qatab, Bassam Ali ;
Mustafa, Mumtaz Begum .
IEEE ACCESS, 2021, 9 :18183-18194
[2]   Progress in multi-object detection models: a comprehensive survey [J].
Balakrishna, Sivadi ;
Mustapha, Ahmad Abubakar .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (15) :22405-22439
[3]   Speech Intelligibility in Dysarthrias: Influence of Utterance Length [J].
Barreto, Simone dos Santos ;
Ortiz, Karin Zazo .
FOLIA PHONIATRICA ET LOGOPAEDICA, 2020, 72 (03) :202-210
[4]  
Bassam DS., 2022, J Voice, V32, P302
[5]   Evaluation of an Automatic Speech Recognition Platform for Dysarthric Speech [J].
Calvo, Irene ;
Tropea, Peppino ;
Vigano, Mauro ;
Scialla, Maria ;
Cavalcante, Agnieszka B. ;
Grajzer, Monika ;
Gilardone, Marco ;
Corbo, Massimo .
FOLIA PHONIATRICA ET LOGOPAEDICA, 2021, 73 (05) :432-441
[6]   Investigation of Scalograms with a Deep Feature Fusion Approach for Detection of Parkinson's Disease [J].
Canturk, Ismail ;
Gunay, Osman .
COGNITIVE COMPUTATION, 2024, 16 (03) :1198-1209
[7]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[8]  
Dutta S, 2022, IEEE transactions on neural networks and learning systems
[9]   Theory of frequency dependent acoustics in patchy-saturated porous media [J].
Johnson, DL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (02) :682-694
[10]  
Joshy AA, 2021, EUR SIGNAL PR CONF, P116, DOI 10.23919/Eusipco47968.2020.9287741