Comparative analysis of different time-frequency image representations for the detection and severity classification of dysarthric speech using deep learning

被引:0
|
作者
Aurobindo, S. [1 ]
Prakash, R. [1 ]
Rajeshkumar, M. [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore, Tamil Nadu, India
关键词
Speech features; Dysarthria; Deep convolutional neural network; Severity classification; Time-frequency image representation; INTELLIGIBILITY; RECOGNITION; FEATURES; PHASE;
D O I
10.1016/j.rineng.2025.104561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Dysarthria, a speech disorder resulting from neurological damage, presents significant challenges in clinical diagnosis and assessment. Traditional methods of dysarthria detection are often time-consuming and require expert interpretation. This study analyzes various time-frequency image representations of TORGO dysarthric speech to facilitate the automatic detection and classification of dysarthria severity through deep convolutional neural networks (DCNN). The dysarthria detection problem was approached in experiment E1, a binary classification task involving dysarthria and a healthy control class. Experiment E2 employs a multiclass classification method, categorizing data into very low, low, medium, and healthy classes. The analysis of time- frequency image representations of speech features is presented in two forms: standard-form images, including cepstrogram and spectrogram, and compact-form images, such as cochleagram and mel-scalogram. The highest ranked feature is benchmarked with existing work for both dysarthria detection and its severity classification. And this proposed work analyzes the frequency behavior of time-frequency image representations by bifurcating the standard-form images into two halves: one half representing low frequency and the other half representing high frequency. By this approach, the bifurcated standard-form of cepstrogram with low frequency outperforms all other features by achieving a validation accuracy of 99.53% for E1 and 97.85% for E2 and this surpasses existing benchmark work by 6% for E1 and by 1.7% for E2 in the TORGO dataset. The best ranked feature of E1 and E2 was applied to the noise-reduced UASpeech dataset and the DCNN model achieved 98.75% accuracy in E1 and 95.98% in E2, demonstrating its effectiveness on new dataset.
引用
收藏
页数:14
相关论文
共 44 条
  • [1] Investigation of Different Time-Frequency Representations for Intelligibility Assessment of Dysarthric Speech
    Chandrashekar, H. M.
    Karjigi, Veena
    Sreedevi, N.
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (12) : 2880 - 2889
  • [2] Comparative analysis of deep learning models for dysarthric speech detection
    Shanmugapriya, P.
    Mohan, V.
    SOFT COMPUTING, 2024, 28 (06) : 5683 - 5698
  • [3] EXPERIMENTAL INVESTIGATION ON STFT PHASE REPRESENTATIONS FOR DEEP LEARNING-BASED DYSARTHRIC SPEECH DETECTION
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6477 - 6481
  • [4] Multiple Classification of Gait Using Time-Frequency Representations and Deep Convolutional Neural Networks
    Jung, Dawoon
    Nguyen, Mau Dung
    Park, Mina
    Kim, Jinwook
    Mun, Kyung-Ryoul
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (04) : 997 - 1005
  • [5] A Deep-Learning Approach to Heart Sound Classification Based on Combined Time-Frequency Representations
    Orozco-Reyes, Leonel
    Alonso-Arevalo, Miguel A.
    Garcia-Canseco, Eloisa
    Ibarra-Hernandez, Roilhi F.
    Conte-Galvan, Roberto
    TECHNOLOGIES, 2025, 13 (04)
  • [6] Benchmarking Time-Frequency Representations of Phonocardiogram Signals for Classification of Valvular Heart Diseases Using Deep Features and Machine Learning
    Chambi, Edwin M.
    Cuela, Jefry
    Zegarra, Milagros
    Sulla, Erasmo
    Rendulich, Jorge
    ELECTRONICS, 2024, 13 (15)
  • [7] Deep Learning-Based End-to-End Speaker Identification Using Time-Frequency Representation of Speech Signal
    Saritha, Banala
    Laskar, Mohammad Azharuddin
    Kirupakaran, Anish Monsley
    Laskar, Rabul Hussain
    Choudhury, Madhuchhanda
    Shome, Nirupam
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 43 (3) : 1839 - 1861
  • [8] Speech activity detection using time-frequency auditory spectral pattern
    Mondal, Sujoy
    Das Barman, Abhirup
    APPLIED ACOUSTICS, 2020, 167
  • [9] Deep learning-based cattle behaviour classification using joint time-frequency data representation
    Hosseininoorbin, Seyedehfaezeh
    Layeghy, Siamak
    Kusy, Brano
    Jurdak, Raja
    Bishop-Hurley, Greg J.
    Greenwood, Paul L.
    Portmann, Marius
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 187
  • [10] Deep Learning-Based Classification of Epileptic Electroencephalography Signals Using a Concentrated Time-Frequency Approach
    Yousif, Mosab A. A.
    Ozturk, Mahmut
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2023, 33 (12)