LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引:8
|
作者
Zhuang, Xiaobin [1 ]
Jiang, Tao [1 ]
Chou, Szu-Yu [1 ]
Wu, Bin [1 ]
Hu, Peng [1 ]
Lui, Simon [1 ]
机构
[1] Tencent Mus Entertainment, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;
D O I
10.1109/ICASSP39728.2021.9414043
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.
引用
收藏
页码:7078 / 7082
页数:5
相关论文
共 50 条
  • [41] Exploring Cross-lingual Singing Voice Synthesis Using Speech Data
    Cao, Yuewen
    Liu, Songxiang
    Kang, Shiyin
    Hu, Na
    Liu, Peng
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [42] Adversarially Trained End-to-end Korean Singing Voice Synthesis System
    Lee, Juheon
    Choi, Hyeong-Seok
    Jeon, Chang-Bin
    Koo, Junghyun
    Lee, Kyogu
    INTERSPEECH 2019, 2019, : 2588 - 2592
  • [43] Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling
    Yi, Yuan-Hao
    Ai, Yang
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2019, 2019, : 2593 - 2597
  • [44] MELLOTRON: MULTISPEAKER EXPRESSIVE VOICE SYNTHESIS BY CONDITIONING ON RHYTHM, PITCH AND GLOBAL STYLE TOKENS
    Valle, Rafael
    Li, Jason
    Prenger, Ryan
    Catanzaro, Bryan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6189 - 6193
  • [45] Factored Maximum Likelihood Kernelized Regression for HMM-based Singing Voice Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Koo, Hyun Woo
    Kim, Nam Soo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 359 - 363
  • [46] KOREAN SINGING VOICE SYNTHESIS BASED ON AUTO-REGRESSIVE BOUNDARY EQUILIBRIUM GAN
    Choi, Soonbeom
    Kim, Wonil
    Park, Saebyul
    Yong, Sangeon
    Nam, Juhan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7234 - 7238
  • [47] VISINGER: VARIATIONAL INFERENCE WITH ADVERSARIAL LEARNING FOR END-TO-END SINGING VOICE SYNTHESIS
    Zhang, Yongmao
    Cong, Jian
    Xue, Heyang
    Xie, Lei
    Zhu, Pengcheng
    Bi, Mengxiao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7237 - 7241
  • [48] CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
    Ye, Zhen
    Xue, Wei
    Tan, Xu
    Chen, Jie
    Liu, Qifeng
    Guo, Yike
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1831 - 1839
  • [49] Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
    Kim, Tae-Woo
    Kang, Min-Su
    Lee, Gyeong-Hoon
    INTERSPEECH 2022, 2022, : 3008 - 3012
  • [50] INTEGRATION OF SPEAKER AND PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
    Shirota, Kanako
    Nakamura, Kazuhiro
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,