UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引：0

作者：

Liang, Shuang ^{[1
]}

Miao, Chenfeng ^{[1
]}

Chen, Minchuan ^{[1
]}

Ma, Jun ^{[1
]}

Wang, Shaojun ^{[1
]}

Xiao, Jing ^{[1
]}

机构：

[1] Ping An Technol, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

speech synthesis; unsupervised; instance discriminator; information bottleneck;

D O I：

10.1109/ICASSP39728.2021.9414220

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.

引用

页码：6583 / 6587

页数：5

共 50 条

[31] MULTI-STYLE ARTISTIC PORTRAIT DRAWING GENERATION
Yi, Ran
Liu, Yong-Jin
2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
[32] MULTI-STYLE MLP FEATURES FOR BN TRANSCRIPTION
Le, Viet-Bac
Lamel, Lori
Gauvain, Jean-Luc
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4866 - 4869
[33] MISS: An Assistant for Multi-Style Simultaneous Translation
Li, Zuchao
Parnow, Kevin
Utiyama, Masao
Sumita, Eiichiro
Zhao, Hai
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2021, : 1 - 10
[34] Style Mixer: Semantic-aware Multi-Style Transfer Network
Huang, Zixuan
Zhang, Jinghuai
Liao, Jing
COMPUTER GRAPHICS FORUM, 2019, 38 (07) : 469 - 480
[35] MCLGAN: a multi-style cartoonization method based on style condition information
Li, Canlin
Wang, Xinyue
Yi, Ran
Zhang, Wenjiao
Bi, Lihua
Ma, Lizhuang
VISUAL COMPUTER, 2025, 41 (04): : 2529 - 2544
[36] Vehicle Trajectory Tracking and Collision Avoidance Control Based on Multi-style Reinforcement Learning
Xiao L.
Zhang F.
Chen L.
Yan H.
Ma F.
Li S.E.
Duan J.
Qiche Gongcheng/Automotive Engineering, 2024, 46 (06): : 945 - 955
[37] Multi-style image generation based on semantic image
Yu, Yue
Li, Ding
Li, Benyuan
Li, Nengli
VISUAL COMPUTER, 2024, 40 (05): : 3411 - 3426
[38] Multi-style image generation based on semantic image
Yue Yu
Ding Li
Benyuan Li
Nengli Li
The Visual Computer, 2024, 40 : 3411 - 3426
[39] GAN-Based Multi-Style Photo Cartoonization
Shu, Yezhi
Yi, Ran
Xia, Mengfei
Ye, Zipeng
Zhao, Wang
Chen, Yang
Lai, Yu-Kun
Liu, Yong-Jin
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (10) : 3376 - 3390
[40] UNSUPERVISED STYLE AND CONTENT SEPARATION BY MINIMIZING MUTUAL INFORMATION FOR SPEECH SYNTHESIS
Hu, Ting-Yao
Shrivastava, Ashish
Tuzel, Oncel
Dhir, Chandra
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3267 - 3271

← 1 2 3 4 5 →