Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引:0
|
作者
Aurobindo, S. [1 ]
Prakash, R. [1 ]
Rajeshkumar, M. [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India
关键词
Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;
D O I
10.1016/j.rineng.2025.104561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.
引用
收藏
页数:14
相关论文
共 44 条
  • [21] Detection and Severity Classification of COVID-19 in CT Images Using Deep Learning
    Qiblawey, Yazan
    Tahir, Anas
    Chowdhury, Muhammad E. H.
    Khandakar, Amith
    Kiranyaz, Serkan
    Rahman, Tawsifur
    Ibtehaz, Nabil
    Mahmud, Sakib
    Maadeed, Somaya Al
    Musharavati, Farayi
    Ayari, Mohamed Arselene
    DIAGNOSTICS, 2021, 11 (05)
  • [22] Heart Arrhythmia Detection and Classification: A Comparative Study Using Deep Learning Models
    Arora, Anuja
    Taneja, Anu
    Hemanth, Jude
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2023, 47 (04) : 1635 - 1655
  • [23] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
    Yu, Yang
    Wang, Wenwu
    Han, Peng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
  • [24] Diabetic retinopathy detection and severity classification using optimized deep learning with explainable AI technique
    Lalithadevi B.
    Krishnaveni S.
    Multimedia Tools Appl, 2024, 42 (89949-90013): : 89949 - 90013
  • [25] Detection, quantification and classification of ripened tomatoes: a comparative analysis of image processing and machine learning
    Alam Siddiquee, Kazy Noor e
    Islam, Md. Shabiul
    Dowla, Mohammad Yasin Ud
    Rezaul, Karim Mohammed
    Grout, Vic
    IET IMAGE PROCESSING, 2020, 14 (11) : 2442 - 2456
  • [26] Performance evaluation of time-frequency image feature sets for improved classification and analysis of non-stationary signals: Application to newborn EEG seizure detection
    Boashash, Boualem
    Barki, Hichem
    Ouelha, Samir
    KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 188 - 203
  • [27] Landslide detection using deep learning and object-based image analysis
    Ghorbanzadeh, Omid
    Shahabi, Hejar
    Crivellari, Alessandro
    Homayouni, Saeid
    Blaschke, Thomas
    Ghamisi, Pedram
    LANDSLIDES, 2022, 19 (04) : 929 - 939
  • [28] Instantaneous mental workload assessment using time-frequency analysis and semi-supervised learning
    Zhang, Jianhua
    Li, Jianrong
    Wang, Rubin
    COGNITIVE NEURODYNAMICS, 2020, 14 (05) : 619 - 642
  • [29] Automatic detection of generalized paroxysmal fast activity in interictal EEG using time-frequency analysis
    Omidvarnia, Amir
    Warren, Aaron E. L.
    Dalic, Linda J.
    Pedersen, Mangor
    Jackson, Graeme
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 133
  • [30] Original Automatic sleep stage classification using time-frequency images of CWT and transfer learning using convolution neural network
    Jadhav, Pankaj
    Rajguru, Gaurav
    Datta, Debabrata
    Mukhopadhyay, Siddhartha
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2020, 40 (01) : 494 - 504