LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引:8
|
作者
Zhuang, Xiaobin [1 ]
Jiang, Tao [1 ]
Chou, Szu-Yu [1 ]
Wu, Bin [1 ]
Hu, Peng [1 ]
Lui, Simon [1 ]
机构
[1] Tencent Mus Entertainment, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;
D O I
10.1109/ICASSP39728.2021.9414043
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.
引用
收藏
页码:7078 / 7082
页数:5
相关论文
共 50 条
  • [21] A Lyrics to Singing Voice Synthesis system with variable timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL II, 2010, : 109 - 112
  • [22] Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation
    Song, Yingjie
    Song, Wei
    Zhang, Wei
    Zhang, Zhengchen
    Zeng, Dan
    Liu, Zhi
    Yu, Yang
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [23] DeepSinger: Singing Voice Synthesis with Data Mined From the Web
    Ren, Yi
    Tan, Xu
    Qin, Tao
    Luan, Jian
    Zhao, Zhou
    Liu, Tie-Yan
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1979 - 1989
  • [24] VIBRATO LEARNING IN MULTI-SINGER SINGING VOICE SYNTHESIS
    Liu, Ruolan
    Wen, Xue
    Lu, Chunhui
    Son, Liming
    Sung, June Sig
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 773 - 779
  • [25] LEARN2SING: TARGET SPEAKER SINGING VOICE SYNTHESIS BY LEARNING FROM A SINGING TEACHER
    Xue, Heyang
    Yang, Shan
    Lei, Yi
    Xie, Lei
    Li, Xiulin
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 522 - 529
  • [26] Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
    Huang, Rongjie
    Chen, Feiyang
    Ren, Yi
    Liu, Jinglin
    Cui, Chenye
    Zhao, Zhou
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3945 - 3954
  • [27] WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
    Zhang, Zewang
    Zheng, Yibin
    Li, Xinhui
    Lu, Li
    INTERSPEECH 2022, 2022, : 4252 - 4256
  • [28] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Li, Xian
    Wang, Zengfu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (02) : 192 - 202
  • [29] A corpus-based concatenative Mandarin singing voice synthesis system
    Zhou, Shu-Sen
    Chen, Qing-Cai
    Wang, Dan-Dan
    Yang, Xiao-Hong
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2695 - 2699
  • [30] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Xian Li
    Zengfu Wang
    IEEE/CAAJournalofAutomaticaSinica, 2016, 3 (02) : 192 - 202