Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引：0

作者：

Aurobindo, S. ^{[1
]}

Prakash, R. ^{[1
]}

Rajeshkumar, M. ^{[1
]}

机构：

[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India

来源：

RESULTS IN ENGINEERING | 2025年 / 25卷

关键词：

Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;

D O I：

10.1016/j.rineng.2025.104561

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.

引用

页数：14

共 44 条

[21] Detection and Severity Classification of COVID-19 in CT Images Using Deep Learning
Qiblawey, Yazan
Tahir, Anas
Chowdhury, Muhammad E. H.
Khandakar, Amith
Kiranyaz, Serkan
Rahman, Tawsifur
Ibtehaz, Nabil
Mahmud, Sakib
Maadeed, Somaya Al
Musharavati, Farayi
Ayari, Mohamed Arselene
DIAGNOSTICS, 2021, 11 (05)
[22] Heart Arrhythmia Detection and Classification: A Comparative Study Using Deep Learning Models
Arora, Anuja
Taneja, Anu
Hemanth, Jude
IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2023, 47 (04) : 1635 - 1655
[23] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
Yu, Yang
Wang, Wenwu
Han, Peng
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
[24] Diabetic retinopathy detection and severity classification using optimized deep learning with explainable AI technique
Lalithadevi B.
Krishnaveni S.
Multimedia Tools Appl, 2024, 42 (89949-90013): : 89949 - 90013
[25] Detection, quantification and classification of ripened tomatoes: a comparative analysis of image processing and machine learning
Alam Siddiquee, Kazy Noor e
Islam, Md. Shabiul
Dowla, Mohammad Yasin Ud
Rezaul, Karim Mohammed
Grout, Vic
IET IMAGE PROCESSING, 2020, 14 (11) : 2442 - 2456
[26] Performance evaluation of time-frequency image feature sets for improved classification and analysis of non-stationary signals: Application to newborn EEG seizure detection
Boashash, Boualem
Barki, Hichem
Ouelha, Samir
KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 188 - 203
[27] Landslide detection using deep learning and object-based image analysis
Ghorbanzadeh, Omid
Shahabi, Hejar
Crivellari, Alessandro
Homayouni, Saeid
Blaschke, Thomas
Ghamisi, Pedram
LANDSLIDES, 2022, 19 (04) : 929 - 939
[28] Instantaneous mental workload assessment using time-frequency analysis and semi-supervised learning
Zhang, Jianhua
Li, Jianrong
Wang, Rubin
COGNITIVE NEURODYNAMICS, 2020, 14 (05) : 619 - 642
[29] Automatic detection of generalized paroxysmal fast activity in interictal EEG using time-frequency analysis
Omidvarnia, Amir
Warren, Aaron E. L.
Dalic, Linda J.
Pedersen, Mangor
Jackson, Graeme
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 133
[30] Original Automatic sleep stage classification using time-frequency images of CWT and transfer learning using convolution neural network
Jadhav, Pankaj
Rajguru, Gaurav
Datta, Debabrata
Mukhopadhyay, Siddhartha
BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2020, 40 (01) : 494 - 504

← 1 2 3 4 5 →