UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引：0

作者：

Liang, Shuang ^{[1
]}

Miao, Chenfeng ^{[1
]}

Chen, Minchuan ^{[1
]}

Ma, Jun ^{[1
]}

Wang, Shaojun ^{[1
]}

Xiao, Jing ^{[1
]}

机构：

[1] Ping An Technol, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

speech synthesis; unsupervised; instance discriminator; information bottleneck;

D O I：

10.1109/ICASSP39728.2021.9414220

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.

引用

页码：6583 / 6587

页数：5

共 50 条

[41] Multi-style video stylization based on texture advection
Tang Ying
Zhang Yan
Shi XiaoYing
Fan Jing
SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (11) : 1 - 13
[42] On Speech Features Fusion, α-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification
Venturini, A.
Zao, L.
Coelho, R.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1951 - 1964
[43] Multi-Style Shape Matching GAN for Text Images
Yuan, Honghui
Yanai, Keiji
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (04) : 505 - 514
[44] Multi-style speaker recognition database in practical conditions
Das R.K.
Jelil S.
Prasanna S.R.M.
Das, Rohan Kumar (rohankd@iitg.ernet.in), 2018, Springer Science and Business Media, LLC (21) : 409 - 419
[45] Semi-Supervised Learning for Robust Emotional Speech Synthesis with Limited Data
Zhang, Jialin
Wushouer, Mairidan
Tuerhong, Gulanbaier
Wang, Hanfang
APPLIED SCIENCES-BASEL, 2023, 13 (09):
[46] Multi-Style Transfer with Discriminative Feedback on Disjoint Corpus
Goyal, Navita
Srinivasan, Balaji Vasan
Anandhavelu, N.
Sancheti, Abhilasha
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3500 - 3510
[47] Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Wang, Yuxuan
Stanton, Daisy
Zhang, Yu
Skerry-Ryan, R. J.
Battenberg, Eric
Shor, Joel
Xiao, Ying
Ren, Fei
Jia, Ye
Saurous, Rif A.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[48] Multi-style video stylization based on texture advection
TANG Ying
ZHANG Yan
SHI XiaoYing
FAN Jing
Science China(Information Sciences), 2015, 58 (11) : 90 - 102
[49] A configurable method for multi-style license plate recognition
Jiao, Jianbin
Ye, Qixiang
Huang, Qingming
PATTERN RECOGNITION, 2009, 42 (03) : 358 - 369
[50] A scalable approach to multi-style architectural modeling and verification
Wong, Stephen
Sun, Jing
Warren, Ian
Sun, Jun
ICECCS 2008: THIRTEENTH IEEE INTERNATIONAL CONFERENCE ON THE ENGINEERING OF COMPLEX COMPUTER SYSTEMS, PROCEEDINGS, 2008, : 25 - +

← 1 2 3 4 5 →