Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引:0
|
作者
Aurobindo, S. [1 ]
Prakash, R. [1 ]
Rajeshkumar, M. [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India
关键词
Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;
D O I
10.1016/j.rineng.2025.104561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.
引用
收藏
页数:14
相关论文
共 44 条
  • [31] Lung Disease Classification using Different Deep Learning Architectures and Principal Component Analysis
    Ming, Joel Than Chia
    Noor, Norliza Mohd
    Rijal, Omar Mohd
    Kassim, Rosminah M.
    Yunus, Ashari
    2018 2ND INTERNATIONAL CONFERENCE ON BIOSIGNAL ANALYSIS, PROCESSING AND SYSTEMS (ICBAPS 2018), 2018, : 187 - 190
  • [32] Normal and hypoacoustic infant cry signal classification using time-frequency analysis and general regression neural network
    Hariharan, M.
    Sindhu, R.
    Yaacob, Sazali
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2012, 108 (02) : 559 - 569
  • [33] Comparative Analysis of Emotion Classification Based on Facial Expression and Physiological Signals Using Deep Learning
    Oh, SeungJun
    Kim, Dong-Keun
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [34] Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal
    Banala Saritha
    Mohammad Azharuddin Laskar
    Anish Monsley Kirupakaran
    Rabul Hussain Laskar
    Madhuchhanda Choudhury
    Nirupam Shome
    Circuits, Systems, and Signal Processing, 2024, 43 : 1839 - 1861
  • [35] Real-Time Detection of Cracks on Concrete Bridge Decks Using Deep Learning in the Frequency Domain
    Zhang, Qianyun
    Barri, Kaveh
    Babanajad, Saeed K.
    Alavi, Amir H.
    ENGINEERING, 2021, 7 (12) : 1786 - 1796
  • [36] A methodology for time-frequency image processing applied to the classification of non-stationary multichannel signals using instantaneous frequency descriptors with application to newborn EEG signals
    Boashash, Boualem
    Boubchir, Larbi
    Azemi, Ghasem
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [37] Ventricular Fibrillation and Tachycardia detection from surface ECG using time-frequency representation images as input dataset for machine learning
    Mjahad, A.
    Rosado-Munoz, A.
    Bataller-Mompean, M.
    Frances-Villora, J. V.
    Guerrero-Martinez, J. F.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2017, 141 : 119 - 127
  • [38] Review on Pest Detection and Classification in Agricultural Environments Using Image-Based Deep Learning Models and Its Challenges
    Venkatasaichandrakanth, P.
    Iyapparaja, M.
    OPTICAL MEMORY AND NEURAL NETWORKS, 2023, 32 (04) : 295 - 309
  • [39] Terrain classification and rock abundance analysis at Utopia Planitia using Zhurong image data based on deep learning algorithms
    Shen, Yan
    Pan, Dong
    Cao, Hongtao
    Yuan, Baofeng
    Jia, Yang
    He, Lianbin
    Zou, Meng
    JOURNAL OF TERRAMECHANICS, 2025, 117
  • [40] The classification of absence seizures using power-to-power cross-frequency coupling analysis with a deep learning network
    Medvedev, A. V.
    Lehmann, B.
    FRONTIERS IN NEUROINFORMATICS, 2025, 19