UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引:0
|
作者
Liang, Shuang [1 ]
Miao, Chenfeng [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
speech synthesis; unsupervised; instance discriminator; information bottleneck;
D O I
10.1109/ICASSP39728.2021.9414220
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.
引用
收藏
页码:6583 / 6587
页数:5
相关论文
共 50 条
  • [31] MULTI-STYLE ARTISTIC PORTRAIT DRAWING GENERATION
    Yi, Ran
    Liu, Yong-Jin
    2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [32] MULTI-STYLE MLP FEATURES FOR BN TRANSCRIPTION
    Le, Viet-Bac
    Lamel, Lori
    Gauvain, Jean-Luc
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4866 - 4869
  • [33] MISS: An Assistant for Multi-Style Simultaneous Translation
    Li, Zuchao
    Parnow, Kevin
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Hai
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2021, : 1 - 10
  • [34] Style Mixer: Semantic-aware Multi-Style Transfer Network
    Huang, Zixuan
    Zhang, Jinghuai
    Liao, Jing
    COMPUTER GRAPHICS FORUM, 2019, 38 (07) : 469 - 480
  • [35] MCLGAN: a multi-style cartoonization method based on style condition information
    Li, Canlin
    Wang, Xinyue
    Yi, Ran
    Zhang, Wenjiao
    Bi, Lihua
    Ma, Lizhuang
    VISUAL COMPUTER, 2025, 41 (04): : 2529 - 2544
  • [36] Vehicle Trajectory Tracking and Collision Avoidance Control Based on Multi-style Reinforcement Learning
    Xiao L.
    Zhang F.
    Chen L.
    Yan H.
    Ma F.
    Li S.E.
    Duan J.
    Qiche Gongcheng/Automotive Engineering, 2024, 46 (06): : 945 - 955
  • [37] Multi-style image generation based on semantic image
    Yu, Yue
    Li, Ding
    Li, Benyuan
    Li, Nengli
    VISUAL COMPUTER, 2024, 40 (05): : 3411 - 3426
  • [38] Multi-style image generation based on semantic image
    Yue Yu
    Ding Li
    Benyuan Li
    Nengli Li
    The Visual Computer, 2024, 40 : 3411 - 3426
  • [39] GAN-Based Multi-Style Photo Cartoonization
    Shu, Yezhi
    Yi, Ran
    Xia, Mengfei
    Ye, Zipeng
    Zhao, Wang
    Chen, Yang
    Lai, Yu-Kun
    Liu, Yong-Jin
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (10) : 3376 - 3390
  • [40] UNSUPERVISED STYLE AND CONTENT SEPARATION BY MINIMIZING MUTUAL INFORMATION FOR SPEECH SYNTHESIS
    Hu, Ting-Yao
    Shrivastava, Ashish
    Tuzel, Oncel
    Dhir, Chandra
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3267 - 3271