LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引:8
|
作者
Zhuang, Xiaobin [1 ]
Jiang, Tao [1 ]
Chou, Szu-Yu [1 ]
Wu, Bin [1 ]
Hu, Peng [1 ]
Lui, Simon [1 ]
机构
[1] Tencent Mus Entertainment, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;
D O I
10.1109/ICASSP39728.2021.9414043
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.
引用
收藏
页码:7078 / 7082
页数:5
相关论文
共 50 条
  • [1] MLP SINGER: TOWARDS RAPID PARALLEL KOREAN SINGING VOICE SYNTHESIS
    Tae, Jaesung
    Kim, Hyeongju
    Lee, Younggun
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [2] Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016
    Bonada, Jordi
    Umbert, Marti
    Blaauw, Merlijn
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1230 - 1234
  • [3] FGP-GAN: Fine-Grained Perception Integrated Generative Adversarial Network for Expressive Mandarin Singing Voice Synthesis
    Liu, Xin
    Zhang, Weiwei
    Zheng, Zhaohui
    Pan, Mingyang
    Wang, Rong
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (03) : 6054 - 6063
  • [4] Expressive control of singing voice synthesis using musical contexts and a parametric F0 model
    Ardaillon, Luc
    Chabot-Canet, Celine
    Roebel, Axel
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1250 - 1254
  • [5] Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information
    Zhou, Shaohuan
    Lei, Shun
    You, Weiya
    Tuo, Deyi
    You, Yuren
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    INTERSPEECH 2022, 2022, : 4292 - 4296
  • [6] SINGING VOICE SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6955 - 6959
  • [7] SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System
    Zhao, Junchuan
    Chetwin, Low Qi Hong
    Wang, Ye
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2641 - 2653
  • [8] FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS
    Nakamura, Kazuhiro
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7239 - 7243
  • [9] Singing Voice Synthesis System for Carnatic Music
    Rajan, Ragesh M.
    2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 831 - 835
  • [10] MusicFace: Music-driven expressive singing face synthesis
    Liu, Pengfei
    Deng, Wenjin
    Li, Hengda
    Wang, Jintai
    Zheng, Yinglin
    Ding, Yiwei
    Guo, Xiaohu
    Zeng, Ming
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (01): : 119 - 136