Few-shot dysarthric speech recognition with text-to-speech data augmentation

被引：2

作者：

Hermann, Enno ^{[1
]}

Magimai-Doss, Mathew ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

INTERSPEECH 2023 | 2023年

关键词：

automatic speech recognition; dysarthric speech; text-to-speech; few-shot learning;

D O I：

10.21437/Interspeech.2023-2481

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of dysarthric speech pose challenges, while recording large amounts of training data can be exhausting for patients. In this paper, we synthesise dysarthric speech with a FastSpeech 2-based multi-speaker text-to-speech (TTS) system for ASR data augmentation. We evaluate its few-shot capability by generating dysarthric speech with as few as 5 words from an unseen target speaker and then using it to train speaker-dependent ASR systems. The results indicated that, while the TTS output is not yet of sufficient quality, this could allow easy development of personalised acoustic models for new dysarthric speakers and domains in the future.

引用

页码：156 / 160

页数：5

共 50 条

[1] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
Vachhani, Bhavik
Bhat, Chitralekha
Kopparapu, Sunil Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
[2] A prototypical network for few-shot recognition of speech imagery data
Hernandez-Galvan, Alan
Ramirez-Alonso, Graciela
Ramirez-Quintana, Juan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 86
[3] Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
Cong-Thanh Do
Imai, Shuhei
Doddipatla, Rama
Hain, Thomas
32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 136 - 140
[4] Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Huang, Sung-Feng
Lin, Chyi-Jiunn
Liu, Da-Rong
Chen, Yi-Chen
Lee, Hung-yi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1558 - 1571
[5] Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Choi, Seungwoo
Han, Seungju
Kim, Dongyoung
Ha, Sungjoo
INTERSPEECH 2020, 2020, : 2007 - 2011
[6] Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
Jin, Zengrui
Geng, Mengzhe
Deng, Jiajun
Wang, Tianzi
Hu, Shujie
Li, Guinan
Liu, Xunying
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 413 - 429
[7] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Laptev, Aleksandr
Korostik, Roman
Svischev, Aleksey
Andrusenko, Andrei
Medennikov, Ivan
Rybin, Sergey
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
[8] Pre-Finetuning for Few-Shot Emotional Speech Recognition
Chen, Maximillian
Yu, Zhou
INTERSPEECH 2023, 2023, : 3602 - 3606
[9] ON-THE-FLY DATA AUGMENTATION FOR TEXT-TO-SPEECH STYLE TRANSFER
Chung, Raymond
Mak, Brian
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 634 - 641
[10] Effective Data Augmentation Methods for Neural Text-to-Speech Systems
Oh, Suhyeon
Kwon, Ohsung
Hwang, Min-Jae
Kim, Jae-Min
Song, Eunwoo
2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,

← 1 2 3 4 5 →