Scalogram based performance comparison of deep learning architectures for dysarthric speech detection

被引：0

作者：

Shabber, Shaik Mulla ^{[1
]}

Sumesh, E. P. ^{[1
]}

Ramachandran, Vidhya Lavanya ^{[2
]}

机构：

[1] VIT AP Univ, Sch Elect Engn, Amaravati 522237, Andhra Pradesh, India

[2] Middle East Coll, Dept Comp & Elect Engn, Knowledge Oasis, Muscat 124, Oman

来源：

ARTIFICIAL INTELLIGENCE REVIEW | 2025年 / 58卷 / 05期

关键词：

Dysartrhic speech; Deep learning; Scalogram; Wavelet transformation; CNN; CLASSIFICATION; SEVERITY;

D O I：

10.1007/s10462-024-11085-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dysarthria, a speech disorder commonly associated with neurological conditions, poses challenges in early detection and accurate diagnosis. This study addresses these challenges by implementing preprocessing steps, such as noise reduction and normalization, to enhance the quality of raw speech signals and extract relevant features. Scalogram images generated through wavelet transform effectively capture the time-frequency characteristics of the speech signal, offering a visual representation of the spectral content over time and providing valuable insights into speech abnormalities related to dysarthria. Fine-tuned deep learning models, including pre-trained convolutional neural network (CNN) architectures like VGG19, DenseNet-121, Xception, and a modified InceptionV3, were optimized with specific hyperparameters using training and validation sets. Transfer learning enables these models to adapt features from general image classification tasks to classify dysarthric speech signals better. The study evaluates the models using two public datasets TORGO and UA-Speech and a third dataset collected by the authors and verified by medical practitioners. The results reveal that the CNN models achieve an accuracy (acc) range of 90% to 99%, an F1-score range of 0.95 to 0.99, and a recall range of 0.96 to 0.99, outperforming traditional methods in dysarthria detection. These findings highlight the effectiveness of the proposed approach, leveraging deep learning and scalogram images to advance early diagnosis and healthcare outcomes for individuals with dysarthria.

引用

页数：27

共 36 条

[21] Deep Scalogram Representations for Acoustic Scene Classification [J].

Ren, Zhao ;

Qian, Kun ;

Zhang, Zixing ;

Pandit, Vedhas ;

Baird, Alice ;

Schuller, Bjoern .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2018, 5 (03) :662-669

[22] Wavelets and signal processing [J].

Rioul, Olivier ;

Vetterli, Martin .

IEEE SIGNAL PROCESSING MAGAZINE, 1991, 8 (04) :14-38

[23] The TORGO database of acoustic and articulatory speech from speakers with dysarthria [J].

Rudzicz, Frank ;

Namasivayam, Aravind Kumar ;

Wolff, Talya .

LANGUAGE RESOURCES AND EVALUATION, 2012, 46 (04) :523-541

[24]

Satyasai B, 2023, 2023 3 INT C ART INT, P1

[25]

Shabber S.M., 2023, 2023 14 INT C COMP C, P1

[26] AFM signal model for dysarthric speech classification using speech biomarkers [J].

Shabber, Shaik Mulla ;

Sumesh, Eratt Parameswaran .

FRONTIERS IN HUMAN NEUROSCIENCE, 2024, 18

[27]

Shabber SM, 2023 INT C EL EL COM, P1

[28] Comparative analysis of deep learning models for dysarthric speech detection [J].

Shanmugapriya, P. ;

Mohan, V. .

SOFT COMPUTING, 2024, 28 (06) :5683-5698

[29]

Sukanya M., 2023, Comput Sci, DOI [10.7494/csci.2023.24.4.4924, DOI 10.7494/CSCI.2023.24.4.4924]

[30]

Tan C., 1996, IEEE Trans Speech Audio Proc, V4, P377

← 1 2 3 4 →