Few-shot dysarthric speech recognition with text-to-speech data augmentation

被引：3

作者：

Hermann, Enno ^{[1
]}

Magimai-Doss, Mathew ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

INTERSPEECH 2023 | 2023年

关键词：

automatic speech recognition; dysarthric speech; text-to-speech; few-shot learning;

D O I：

10.21437/Interspeech.2023-2481

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of dysarthric speech pose challenges, while recording large amounts of training data can be exhausting for patients. In this paper, we synthesise dysarthric speech with a FastSpeech 2-based multi-speaker text-to-speech (TTS) system for ASR data augmentation. We evaluate its few-shot capability by generating dysarthric speech with as few as 5 words from an unseen target speaker and then using it to train speaker-dependent ASR systems. The results indicated that, while the TTS output is not yet of sufficient quality, this could allow easy development of personalised acoustic models for new dysarthric speakers and domains in the future.

引用

页码：156 / 160

页数：5

共 50 条

[41] Few-shot Partial Multi-label Learning with Data Augmentation [J].

Sun, Yifan ;

Zhao, Yunfeng ;

Yu, Guoxian ;

Yan, Zhongmin ;

Domeniconi, Carlotta .

2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, :478-487

[42] Prompt Based CVAE Data Augmentation for Few-Shot Intention Detection [J].

Xue, Junhao ;

Yin, Chuantao ;

Li, Chen ;

Bai, Jun ;

Chen, Hui ;

Rong, Wenge .

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2024, 2024, 14886 :312-323

[43] EXAMPLAR-BASED SPEECH WAVEFORM GENERATION FOR TEXT-TO-SPEECH [J].

Valentini-Botinhao, Cassia ;

Watts, Oliver ;

Espic, Felipe ;

King, Simon .

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, :332-338

[44] Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems [J].

Ayllon, David ;

Sanchez-Hevia, Hector A. ;

Figueroa, Carol ;

Lanchantin, Pierre .

INTERSPEECH 2019, 2019, :1511-1515

[45] Few-Shot Intent Detection by Data Augmentation and Class Knowledge Transfer [J].

Guo, Zhijun ;

Niu, Kun ;

Chen, Xiao ;

Liu, Qi ;

Li, Xiao .

2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, :458-462

[46] Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System [J].

Viet Lam Phung ;

Huy Kinh Phan ;

Anh Tuan Dinh ;

Quoc Bao Nguyen .

PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, :1-6

[47] Development of GUI for Text-to-Speech Recognition using Natural Language Processing [J].

Mukherjee, Partha ;

Santra, Soumen ;

Bhowmick, Subhajit ;

Paul, Ananya ;

Chatterjee, Pubali ;

Deyasi, Arpan .

2018 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS, MATERIALS ENGINEERING & NANO-TECHNOLOGY (IEMENTECH), 2018, :195-198

[48] Data Augmentation and Few-Shot Change Detection in Forest Remote Sensing [J].

Zhu, Songyu ;

Jing, Weipeng ;

Kang, Peilun ;

Emam, Mahmoud ;

Li, Chao .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 :5919-5934

[49] Data Augmentation Aided Few-Shot Learning for Specific Emitter Identification [J].

Zhang, Xixi ;

Wang, Yu ;

Zhang, Yibin ;

Lin, Yun ;

Gui, Guan ;

Tomoaki, Ohtsuki ;

Sari, Hikmet .

2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,

[50] FEATURE AUGMENTATION LEARNING FOR FEW-SHOT PALMPRINT IMAGE RECOGNITION WITH UNCONSTRAINED ACQUISITION [J].

Jing, Kunlei ;

Zhang, Xinman ;

Yang, Zhiyuan ;

Wen, Bihan .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :3323-3327

← 1 2 3 4 5 →