Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

被引：0

作者：

Min, Dongchan ^{[1
]}

Lee, Dong Bok ^{[1
]}

Yang, Eunho ^{[1
,2
]}

Hwang, Sung Ju ^{[1
,2
]}

机构：

[1] Korea Adv Inst Sci & Technol KAIST, Grad Sch AI, Seoul, South Korea

[2] AITRICS, Seoul, South Korea

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality without fine-tuning. In this work, we propose StyleSpeech, a new TTS model which not only synthesizes high-quality speech but also effectively adapts to new speakers. Specifically, we propose Style-Adaptive Layer Normalization (SALN) which aligns gain and bias of the text input according to the style extracted from a reference speech audio. With SALN, our model effectively synthesizes speech in the style of the target speaker even from a single speech audio. Furthermore, to enhance StyleSpeech's adaptation to speech from new speakers, we extend it to Meta-StyleSpeech by introducing two discriminators trained with style prototypes, and performing episodic training. The experimental results show that our models generate high-quality speech which accurately follows the speaker's voice with single short-duration (1-3 sec) speech audio, significantly outperforming baselines.

引用

页数：12

共 48 条

[1]

Amodei D, 2016, PR MACH LEARN RES, V48

[2]

Arik S., P 34 INT C

[3]

Arik SÖ, 2017, ADV NEUR IN, V30

[4]

Arik SÖ, 2018, ADV NEUR IN, V31

[5]

Ba Jimmy Lei, 2016, arXiv, DOI DOI 10.48550/ARXIV.1607.06450

[6]

Bartunov S., 2018, P MACHINE LEARNING R, V84, P670

[7]

Chen M., 2021, INT C LEARN REPR

[8]

Chen M., 2020, INTERSPEECH

[9]

Chen Y., 2019, 7 INT C LEARN REPR I

[10]

Clou^atre L., 2019, ABS190102199 ARXIV

← 1 2 3 4 5 →