UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引：0

作者：

Liang, Shuang ^{[1
]}

Miao, Chenfeng ^{[1
]}

Chen, Minchuan ^{[1
]}

Ma, Jun ^{[1
]}

Wang, Shaojun ^{[1
]}

Xiao, Jing ^{[1
]}

机构：

[1] Ping An Technol, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

speech synthesis; unsupervised; instance discriminator; information bottleneck;

D O I：

10.1109/ICASSP39728.2021.9414220

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.

引用

页码：6583 / 6587

页数：5

共 50 条

[1] Gated Recurrent Attention for Multi-Style Speech Synthesis
Cheon, Sung Jun
Lee, Joun Yeop
Choi, Byoung Jin
Lee, Hyeonseung
Kim, Nam Soo
APPLIED SCIENCES-BASEL, 2020, 10 (15):
[2] Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
Song, Wei
Yue, Yanghao
Zhang, Ya-jie
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 132 - 140
[3] Multi-Style Unsupervised Image Synthesis Using Generative Adversarial Nets
Lv, Guoyun
Israr, Syed Muhammad
Qi, Shengyong
IEEE ACCESS, 2021, 9 : 86025 - 86036
[4] AUTOMATIC OPTIMIZATION OF DATA PERTURBATION DISTRIBUTIONS FOR MULTI-STYLE TRAINING IN SPEECH RECOGNITION
Doulaty, Mortaza
Rose, Richard
Siohan, Olivier
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 21 - 27
[5] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
Xie, Qicong
Li, Tao
Wang, Xinsheng
Wang, Zhichao
Xie, Lei
Yu, Guoqiao
Wan, Guanglu
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
[6] Towards an Unsupervised Speaking Style Voice Building Framework: multi-style speaker diarization
Lorenzo-Trueba, J.
Martinez-Gonzalez, B.
Lopez-Ludena, V.
Barra-Chicote, R.
Ferreiros, J.
Yamagishi, J.
Montero, J. M.
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2275 - 2278
[7] The Recognition of Bimodal Produced Speech based on Multi-style Training
Galic, Jovan
Markovic, Branko
2020 ZOOMING INNOVATION IN CONSUMER TECHNOLOGIES CONFERENCE (ZINC), 2020, : 11 - 14
[8] Style-Aware Contrastive Learning for Multi-Style Image Captioning
Zhou, Yucheng
Long, Guodong
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2257 - 2267
[9] Towards unsupervised text multi-style transfer with parameter-sharing scheme
Chen, Xi
Zhang, Song
Shen, Gehui
Deng, Zhi-Hong
Yun, Unil
NEUROCOMPUTING, 2021, 426 : 227 - 234
[10] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
Cheon, Sung Jun
Choi, Byoung Jin
Kim, Minchan
Lee, Hyeonseung
Kim, Nam Soo
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59

← 1 2 3 4 5 →