A STUDY ON THE IMPACT OF SELF-SUPERVISED LEARNING ON AUTOMATIC DYSARTHRIC SPEECH ASSESSMENT

被引:0
作者
Cadet, Xavier F. [1 ]
Aloufi, Ranya [1 ]
Ahmadi-Abhari, Sara [1 ]
Haddadi, Hamed [1 ]
机构
[1] Imperial Coll London, London, England
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年
基金
英国科研创新办公室; 英国工程与自然科学研究理事会;
关键词
dysarthric speech; speech recognition; self-supervised learning; CLASSIFICATION;
D O I
10.1109/ICASSPW62465.2024.10626129
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automating dysarthria assessments offers the opportunity to develop practical, low-cost tools that address the current limitations of manual and subjective assessments. Nonetheless, the small size of most dysarthria datasets makes it challenging to develop automated assessment. Recent research showed that speech representations from models pre-trained on large unlabelled data can enhance Automatic Speech Recognition (ASR) performance for dysarthric speech. We are the first to evaluate the representations from pre-trained state-of-the-art Self-Supervised models across three downstream tasks on dysarthric speech: disease classification, word recognition and intelligibility classification, and under three noise scenarios on the UA-Speech dataset. We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively +24.7%, +61%, and + 7.2% accuracy compared to classical acoustic features.
引用
收藏
页码:630 / 634
页数:5
相关论文
共 20 条
  • [1] Classification of Dysarthric Speech According to the Severity of Impairment: an Analysis of Acoustic Features
    Al-Qatab, Bassam Ali
    Mustafa, Mumtaz Begum
    [J]. IEEE ACCESS, 2021, 9 : 18183 - 18194
  • [2] Chen GG, 2014, INT CONF ACOUST SPEE
  • [3] Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility
    Falk, Tiago H.
    Chan, Wai-Yip
    Shein, Fraser
    [J]. SPEECH COMMUNICATION, 2012, 54 (05) : 622 - 631
  • [4] Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study
    Fu, Weifeng
    [J]. JMIR MEDICAL INFORMATICS, 2020, 8 (06)
  • [5] Godsill Simon J, 2013, Digital audio restoration
  • [6] An Investigation to Identify Optimal Setup for Automated Assessment of Dysarthric Intelligibility using Deep Learning Technologies
    Hall, Kyle
    Huang, Andy
    Shahamiri, Seyed Reza
    [J]. COGNITIVE COMPUTATION, 2023, 15 (01) : 146 - 158
  • [7] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
    Hsu, Wei-Ning
    Bolte, Benjamin
    Tsai, Yao-Hung Hubert
    Lakhotia, Kushal
    Salakhutdinov, Ruslan
    Mohamed, Abdelrahman
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3451 - 3460
  • [8] A Review of Automated Intelligibility Assessment for Dysarthric Speakers
    Huang, Andy
    Hall, Kyle
    Watson, Catherine
    Shahamiri, Seyed Reza
    [J]. 2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 19 - 24
  • [9] Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques
    Joshy, Amlu Anna
    Rajan, Rajeev
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2022, 30 : 1147 - 1157
  • [10] Kim H, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1741