LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引:8
|
作者
Zhuang, Xiaobin [1 ]
Jiang, Tao [1 ]
Chou, Szu-Yu [1 ]
Wu, Bin [1 ]
Hu, Peng [1 ]
Lui, Simon [1 ]
机构
[1] Tencent Mus Entertainment, Shenzhen, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;
D O I
10.1109/ICASSP39728.2021.9414043
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.
引用
收藏
页码:7078 / 7082
页数:5
相关论文
共 50 条
  • [31] XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
    Lu, Peiling
    Wu, Jie
    Luan, Jian
    Tan, Xu
    Zhou, Li
    INTERSPEECH 2020, 2020, : 1306 - 1310
  • [32] Mandarin Singing-voice Synthesis Using an HNM Based Scheme
    Gu, Hung-Yan
    Liao, Huang-Liang
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (01) : 303 - 317
  • [33] PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
    Oura, Keiichiro
    Mase, Ayami
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5377 - 5380
  • [34] KaraTuner: Towards End-to-End Natural Pitch Correction for Singing Voice in Karaoke
    Zhuang, Xiaobin
    Yu, Huiran
    Zhao, Weifeng
    Jiang, Tao
    Hu, Peng
    INTERSPEECH 2022, 2022, : 4262 - 4266
  • [35] Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
    Shi, Jiatong
    Guo, Shuai
    Qian, Tao
    Huo, Nan
    Hayashi, Tomoki
    Wu, Yuning
    Xu, Frank
    Chang, Xuankai
    Li, Huazhe
    Wu, Peter
    Watanabe, Shinji
    Jin, Qin
    INTERSPEECH 2022, 2022, : 4277 - 4281
  • [36] Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2803 - 2815
  • [37] DNN-BASED ENSEMBLE SINGING VOICE SYNTHESIS WITH INTERACTIONS BETWEEN SINGERS
    Hyodo, Hiroaki
    Takamichi, Shinnosuke
    Nakamura, Tomohiko
    Koguchi, Junya
    Saruwatari, Hiroshi
    2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 660 - 667
  • [38] Karaoker: Alignment-free singing voice synthesis with speech training data
    Kakoulidis, Panos
    Ellinas, Nikolaos
    Vamvoukakis, Georgios
    Markopoulos, Konstantinos
    Sung, June Sig
    Jho, Gunu
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    INTERSPEECH 2022, 2022, : 2993 - 2997
  • [39] Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis
    Lee, Hong-Ru
    Huang, Chih-Fang
    Hsu, Chih-Hao
    Wang, Wen-Nan
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 459 - +
  • [40] Korean Singing Voice Synthesis System based on an LSTM Recurrent Neural Network
    Kim, Juntae
    Choi, Heejin
    Park, Jinuk
    Hahn, Minsoo
    Kim, Sangjin
    Kim, Jong-Jin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1551 - 1555