UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引：0

作者：

Liang, Shuang ^{[1
]}

Miao, Chenfeng ^{[1
]}

Chen, Minchuan ^{[1
]}

Ma, Jun ^{[1
]}

Wang, Shaojun ^{[1
]}

Xiao, Jing ^{[1
]}

机构：

[1] Ping An Technol, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

speech synthesis; unsupervised; instance discriminator; information bottleneck;

D O I：

10.1109/ICASSP39728.2021.9414220

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.

引用

页码：6583 / 6587

页数：5

共 50 条

[21] Multi-Style Generative Reading Comprehension
Nishida, Kyosuke
Saito, Itsumi
Nishida, Kosuke
Shinoda, Kazutoshi
Otsuka, Atsushi
Asano, Hisako
Tomita, Junji
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2273 - 2284
[22] Interactive Artistic Multi-style Transfer
Wang, Xiaohui
Lyu, Yiran
Huang, Junfeng
Wang, Ziying
Qin, Jingyan
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01)
[23] Interactive Artistic Multi-style Transfer
Xiaohui Wang
Yiran Lyu
Junfeng Huang
Ziying Wang
Jingyan Qin
International Journal of Computational Intelligence Systems, 14
[24] Design of a Multi-Style and Multi-Frequency FPGA
Manoranjan, Jotham Vaddaboina
Sajjan, Solomon Surya Tej Mano
Gujari, Vivek B.
Stevens, Kenneth S.
2016 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2016,
[25] Multi-style learning with denoising autoencoders for acoustic modeling in the internet of things (IoT)
Lin, Payton
Lyu, Dau-Cheng
Chen, Fei
Wang, Syu-Siang
Tsao, Yu
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 481 - 495
[26] Image Style Transfer via Multi-Style Geometry Warping
Alexandru, Ioana
Nicula, Constantin
Prodan, Cristian
Rotaru, Razvan-Paul
Tarba, Nicolae
Boiangiu, Costin-Anton
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[27] FPGA architecture for multi-style asynchronous logic
Huot, N
Dubreuil, H
Fesquet, L
Renaudin, M
DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 32 - 33
[28] MSN: Multi-Style Network for Trajectory Prediction
Wong, Conghao
Xia, Beihao
Peng, Qinmu
Yuan, Wei
You, Xinge
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9751 - 9766
[29] INVESTIGATING ON INCORPORATING PRETRAINED AND LEARNABLE SPEAKER REPRESENTATIONS FOR MULTI-SPEAKER MULTI-STYLE TEXT-TO-SPEECH
Chien, Chung-Ming
Lin, Jheng-Hao
Huang, Chien-yu
Hsu, Po-chun
Lee, Hung-yi
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8588 - 8592
[30] Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning
Li, Zheren
Cui, Zhiming
Wang, Sheng
Qi, Yuji
Ouyang, Xi
Chen, Qitian
Yang, Yuezhi
Xue, Zhong
Shen, Dinggang
Cheng, Jie-Zhi
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VII, 2021, 12907 : 98 - 108

← 1 2 3 4 5 →