Few-shot dysarthric speech recognition with text-to-speech data augmentation

被引:2
|
作者
Hermann, Enno [1 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
INTERSPEECH 2023 | 2023年
关键词
automatic speech recognition; dysarthric speech; text-to-speech; few-shot learning;
D O I
10.21437/Interspeech.2023-2481
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of dysarthric speech pose challenges, while recording large amounts of training data can be exhausting for patients. In this paper, we synthesise dysarthric speech with a FastSpeech 2-based multi-speaker text-to-speech (TTS) system for ASR data augmentation. We evaluate its few-shot capability by generating dysarthric speech with as few as 5 words from an unseen target speaker and then using it to train speaker-dependent ASR systems. The results indicated that, while the TTS output is not yet of sufficient quality, this could allow easy development of personalised acoustic models for new dysarthric speakers and domains in the future.
引用
收藏
页码:156 / 160
页数:5
相关论文
共 50 条
  • [1] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [2] A prototypical network for few-shot recognition of speech imagery data
    Hernandez-Galvan, Alan
    Ramirez-Alonso, Graciela
    Ramirez-Quintana, Juan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 86
  • [3] Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
    Cong-Thanh Do
    Imai, Shuhei
    Doddipatla, Rama
    Hain, Thomas
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 136 - 140
  • [4] Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
    Huang, Sung-Feng
    Lin, Chyi-Jiunn
    Liu, Da-Rong
    Chen, Yi-Chen
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1558 - 1571
  • [5] Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
    Choi, Seungwoo
    Han, Seungju
    Kim, Dongyoung
    Ha, Sungjoo
    INTERSPEECH 2020, 2020, : 2007 - 2011
  • [6] Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
    Jin, Zengrui
    Geng, Mengzhe
    Deng, Jiajun
    Wang, Tianzi
    Hu, Shujie
    Li, Guinan
    Liu, Xunying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 413 - 429
  • [7] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
    Laptev, Aleksandr
    Korostik, Roman
    Svischev, Aleksey
    Andrusenko, Andrei
    Medennikov, Ivan
    Rybin, Sergey
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
  • [8] Pre-Finetuning for Few-Shot Emotional Speech Recognition
    Chen, Maximillian
    Yu, Zhou
    INTERSPEECH 2023, 2023, : 3602 - 3606
  • [9] ON-THE-FLY DATA AUGMENTATION FOR TEXT-TO-SPEECH STYLE TRANSFER
    Chung, Raymond
    Mak, Brian
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 634 - 641
  • [10] Effective Data Augmentation Methods for Neural Text-to-Speech Systems
    Oh, Suhyeon
    Kwon, Ohsung
    Hwang, Min-Jae
    Kim, Jae-Min
    Song, Eunwoo
    2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,