A Study on Model Training Strategies for Speaker-Independent and Vocabulary-Mismatched Dysarthric Speech Recognition

被引:0
|
作者
Qi, Jinzi [1 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, KU Leuven, Dept Elect Engn ESAT PSI, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期
关键词
dysarthric speech recognition; phonological features; speaker-independent; vocabulary-mismatched; FEATURES; DATABASE;
D O I
10.3390/app15042006
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatic speech recognition (ASR) systems often struggle to recognize speech from individuals with dysarthria, a speech disorder with neuromuscular causes, with accuracy declining further for unseen speakers and content. Achieving robustness for such situations requires ASR systems to address speaker-independent and vocabulary-mismatched scenarios, minimizing user adaptation effort. This study focuses on comprehensive training strategies and methods to tackle these challenges, leveraging the transformer-based Wav2Vec2.0 model. Unlike prior research, which often focuses on limited datasets, we systematically explore training data selection strategies across diverse source types (languages, canonical vs. dysarthric, and generic vs. in-domain) in a speaker-independent setting. For the under-explored vocabulary-mismatched scenarios, we evaluate conventional methods, identify their limitations, and propose a solution that uses phonological features as intermediate representations for phone recognition to address these gaps. Experimental results demonstrate that this approach enhances recognition across dysarthric datasets in both speaker-independent and vocabulary-mismatched settings. By integrating advanced transfer learning techniques with the innovative use of phonological features, this study addresses key challenges for dysarthric speech recognition, setting a new benchmark for robustness and adaptability in the field.
引用
收藏
页数:25
相关论文
共 50 条
  • [2] DSP-based large vocabulary speaker-independent speech recognition
    Hirayama, H
    Yoshida, K
    Koga, S
    Hattori, H
    NEC RESEARCH & DEVELOPMENT, 1996, 37 (04): : 528 - 534
  • [3] Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
    Yilmaz, Emre
    Mitra, Vikramjit
    Sivaraman, Ganesh
    Franco, Horacio
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 319 - 334
  • [4] Adaptive Compensation Algorithm in Open Vocabulary Mandarin Speaker-Independent Speech Recognition
    FadhilH.T.Al-dulaimy
    王作英
    田野
    Tsinghua Science and Technology, 2002, (05) : 521 - 526
  • [5] Biomimetic pattern recognition for speaker-independent speech recognition
    Qin, H
    Wang, SJ
    Sun, H
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1290 - 1294
  • [6] Predictor codebook for speaker-independent speech recognition
    Kawabata, Takeshi
    Systems and Computers in Japan, 1994, 25 (01): : 37 - 46
  • [7] SPEAKER-INDEPENDENT VOWEL RECOGNITION IN PERSIAN SPEECH
    Nazari, Mohammad
    Sayadiyan, Abolghasem
    Valiollahzadeh, Seyyed Majid
    2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 672 - 676
  • [8] PREDICTOR CODEBOOK FOR SPEAKER-INDEPENDENT SPEECH RECOGNITION
    KAWABATA, T
    SYSTEMS AND COMPUTERS IN JAPAN, 1994, 25 (01) : 37 - 46
  • [9] Japanese Speaker-Independent Homonyms Speech Recognition
    Murakami, Jin'ichi
    Hotta, Haseo
    COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 306 - 313
  • [10] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419