UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引:0
|
作者
Liang, Shuang [1 ]
Miao, Chenfeng [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
speech synthesis; unsupervised; instance discriminator; information bottleneck;
D O I
10.1109/ICASSP39728.2021.9414220
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.
引用
收藏
页码:6583 / 6587
页数:5
相关论文
共 50 条
  • [41] Multi-style video stylization based on texture advection
    Tang Ying
    Zhang Yan
    Shi XiaoYing
    Fan Jing
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (11) : 1 - 13
  • [42] On Speech Features Fusion, α-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification
    Venturini, A.
    Zao, L.
    Coelho, R.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1951 - 1964
  • [43] Multi-Style Shape Matching GAN for Text Images
    Yuan, Honghui
    Yanai, Keiji
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (04) : 505 - 514
  • [44] Multi-style speaker recognition database in practical conditions
    Das R.K.
    Jelil S.
    Prasanna S.R.M.
    Das, Rohan Kumar (rohankd@iitg.ernet.in), 2018, Springer Science and Business Media, LLC (21) : 409 - 419
  • [45] Semi-Supervised Learning for Robust Emotional Speech Synthesis with Limited Data
    Zhang, Jialin
    Wushouer, Mairidan
    Tuerhong, Gulanbaier
    Wang, Hanfang
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [46] Multi-Style Transfer with Discriminative Feedback on Disjoint Corpus
    Goyal, Navita
    Srinivasan, Balaji Vasan
    Anandhavelu, N.
    Sancheti, Abhilasha
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3500 - 3510
  • [47] Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
    Wang, Yuxuan
    Stanton, Daisy
    Zhang, Yu
    Skerry-Ryan, R. J.
    Battenberg, Eric
    Shor, Joel
    Xiao, Ying
    Ren, Fei
    Jia, Ye
    Saurous, Rif A.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [48] Multi-style video stylization based on texture advection
    TANG Ying
    ZHANG Yan
    SHI XiaoYing
    FAN Jing
    Science China(Information Sciences), 2015, 58 (11) : 90 - 102
  • [49] A configurable method for multi-style license plate recognition
    Jiao, Jianbin
    Ye, Qixiang
    Huang, Qingming
    PATTERN RECOGNITION, 2009, 42 (03) : 358 - 369
  • [50] A scalable approach to multi-style architectural modeling and verification
    Wong, Stephen
    Sun, Jing
    Warren, Ian
    Sun, Jun
    ICECCS 2008: THIRTEENTH IEEE INTERNATIONAL CONFERENCE ON THE ENGINEERING OF COMPLEX COMPUTER SYSTEMS, PROCEEDINGS, 2008, : 25 - +