A Study on Model Training Strategies for Speaker-Independent and Vocabulary-Mismatched Dysarthric Speech Recognition

被引：0

作者：

Qi, Jinzi ^{[1
]}

Van Hamme, Hugo ^{[1
]}

机构：

[1] Katholieke Univ Leuven, KU Leuven, Dept Elect Engn ESAT PSI, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期

关键词：

dysarthric speech recognition; phonological features; speaker-independent; vocabulary-mismatched; FEATURES; DATABASE;

D O I：

10.3390/app15042006

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Automatic speech recognition (ASR) systems often struggle to recognize speech from individuals with dysarthria, a speech disorder with neuromuscular causes, with accuracy declining further for unseen speakers and content. Achieving robustness for such situations requires ASR systems to address speaker-independent and vocabulary-mismatched scenarios, minimizing user adaptation effort. This study focuses on comprehensive training strategies and methods to tackle these challenges, leveraging the transformer-based Wav2Vec2.0 model. Unlike prior research, which often focuses on limited datasets, we systematically explore training data selection strategies across diverse source types (languages, canonical vs. dysarthric, and generic vs. in-domain) in a speaker-independent setting. For the under-explored vocabulary-mismatched scenarios, we evaluate conventional methods, identify their limitations, and propose a solution that uses phonological features as intermediate representations for phone recognition to address these gaps. Experimental results demonstrate that this approach enhances recognition across dysarthric datasets in both speaker-independent and vocabulary-mismatched settings. By integrating advanced transfer learning techniques with the innovative use of phonological features, this study addresses key challenges for dysarthric speech recognition, setting a new benchmark for robustness and adaptability in the field.

引用

页数：25

共 50 条

[1] ON LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
LEE, KF
SPEECH COMMUNICATION, 1988, 7 (04) : 375 - 379
[2] DSP-based large vocabulary speaker-independent speech recognition
Hirayama, H
Yoshida, K
Koga, S
Hattori, H
NEC RESEARCH & DEVELOPMENT, 1996, 37 (04): : 528 - 534
[3] Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
Yilmaz, Emre
Mitra, Vikramjit
Sivaraman, Ganesh
Franco, Horacio
COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 319 - 334
[4] Adaptive Compensation Algorithm in Open Vocabulary Mandarin Speaker-Independent Speech Recognition
FadhilH.T.Al-dulaimy
王作英
田野
Tsinghua Science and Technology, 2002, (05) : 521 - 526
[5] Biomimetic pattern recognition for speaker-independent speech recognition
Qin, H
Wang, SJ
Sun, H
PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1290 - 1294
[6] Predictor codebook for speaker-independent speech recognition
Kawabata, Takeshi
Systems and Computers in Japan, 1994, 25 (01): : 37 - 46
[7] SPEAKER-INDEPENDENT VOWEL RECOGNITION IN PERSIAN SPEECH
Nazari, Mohammad
Sayadiyan, Abolghasem
Valiollahzadeh, Seyyed Majid
2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 672 - 676
[8] PREDICTOR CODEBOOK FOR SPEAKER-INDEPENDENT SPEECH RECOGNITION
KAWABATA, T
SYSTEMS AND COMPUTERS IN JAPAN, 1994, 25 (01) : 37 - 46
[9] Japanese Speaker-Independent Homonyms Speech Recognition
Murakami, Jin'ichi
Hotta, Haseo
COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 306 - 313
[10] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
Wang, Jun
Hahm, Seongjun
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419

← 1 2 3 4 5 →