UNSUPERVISED LEARNING FOR MULTI-STYLE SPEECH SYNTHESIS WITH LIMITED DATA

被引:0
|
作者
Liang, Shuang [1 ]
Miao, Chenfeng [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
speech synthesis; unsupervised; instance discriminator; information bottleneck;
D O I
10.1109/ICASSP39728.2021.9414220
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing multi-style speech synthesis methods require either style labels or large amounts of unlabeled training data, making data acquisition difficult. In this paper, we present an unsupervised multi-style speech synthesis method that can be trained with limited data. We leverage instance discriminator to guide a style encoder to learn meaningful style representations from a multi-style dataset. Furthermore, we employ information bottleneck to filter out style-irrelevant information in the representations, which can improve speech quality and style similarity. Our method is able to produce desirable speech using a fairly small dataset, where the baseline GST-Tacotron fails. ABX tests show that our model significantly outperforms GST-Tacotron in both emotional speech synthesis task and multi-speaker speech synthesis task. In addition, we demonstrate that our method is able to learn meaningful style features with only 50 training samples per style.
引用
收藏
页码:6583 / 6587
页数:5
相关论文
共 50 条
  • [1] Gated Recurrent Attention for Multi-Style Speech Synthesis
    Cheon, Sung Jun
    Lee, Joun Yeop
    Choi, Byoung Jin
    Lee, Hyeonseung
    Kim, Nam Soo
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [2] Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
    Song, Wei
    Yue, Yanghao
    Zhang, Ya-jie
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 132 - 140
  • [3] Multi-Style Unsupervised Image Synthesis Using Generative Adversarial Nets
    Lv, Guoyun
    Israr, Syed Muhammad
    Qi, Shengyong
    IEEE ACCESS, 2021, 9 : 86025 - 86036
  • [4] AUTOMATIC OPTIMIZATION OF DATA PERTURBATION DISTRIBUTIONS FOR MULTI-STYLE TRAINING IN SPEECH RECOGNITION
    Doulaty, Mortaza
    Rose, Richard
    Siohan, Olivier
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 21 - 27
  • [5] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
    Xie, Qicong
    Li, Tao
    Wang, Xinsheng
    Wang, Zhichao
    Xie, Lei
    Yu, Guoqiao
    Wan, Guanglu
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
  • [6] Towards an Unsupervised Speaking Style Voice Building Framework: multi-style speaker diarization
    Lorenzo-Trueba, J.
    Martinez-Gonzalez, B.
    Lopez-Ludena, V.
    Barra-Chicote, R.
    Ferreiros, J.
    Yamagishi, J.
    Montero, J. M.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2275 - 2278
  • [7] The Recognition of Bimodal Produced Speech based on Multi-style Training
    Galic, Jovan
    Markovic, Branko
    2020 ZOOMING INNOVATION IN CONSUMER TECHNOLOGIES CONFERENCE (ZINC), 2020, : 11 - 14
  • [8] Style-Aware Contrastive Learning for Multi-Style Image Captioning
    Zhou, Yucheng
    Long, Guodong
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2257 - 2267
  • [9] Towards unsupervised text multi-style transfer with parameter-sharing scheme
    Chen, Xi
    Zhang, Song
    Shen, Gehui
    Deng, Zhi-Hong
    Yun, Unil
    NEUROCOMPUTING, 2021, 426 : 227 - 234
  • [10] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Minchan
    Lee, Hyeonseung
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59