LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引：8

作者：

Zhuang, Xiaobin ^{[1
]}

Jiang, Tao ^{[1
]}

Chou, Szu-Yu ^{[1
]}

Wu, Bin ^{[1
]}

Hu, Peng ^{[1
]}

Lui, Simon ^{[1
]}

机构：

[1] Tencent Mus Entertainment, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;

D O I：

10.1109/ICASSP39728.2021.9414043

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.

引用

页码：7078 / 7082

页数：5

共 50 条

[21] A Lyrics to Singing Voice Synthesis system with variable timbre
Li, Jinlong
Yang, Hongwu
Zhang, Weizhao
Cai, Lianhong
2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL II, 2010, : 109 - 112
[22] Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation
Song, Yingjie
Song, Wei
Zhang, Wei
Zhang, Zhengchen
Zeng, Dan
Liu, Zhi
Yu, Yang
2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
[23] DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Ren, Yi
Tan, Xu
Qin, Tao
Luan, Jian
Zhao, Zhou
Liu, Tie-Yan
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1979 - 1989
[24] VIBRATO LEARNING IN MULTI-SINGER SINGING VOICE SYNTHESIS
Liu, Ruolan
Wen, Xue
Lu, Chunhui
Son, Liming
Sung, June Sig
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 773 - 779
[25] LEARN2SING: TARGET SPEAKER SINGING VOICE SYNTHESIS BY LEARNING FROM A SINGING TEACHER
Xue, Heyang
Yang, Shan
Lei, Yi
Xie, Lei
Li, Xiulin
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 522 - 529
[26] Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
Huang, Rongjie
Chen, Feiyang
Ren, Yi
Liu, Jinglin
Cui, Chenye
Zhao, Zhou
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3945 - 3954
[27] WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
Zhang, Zewang
Zheng, Yibin
Li, Xinhui
Lu, Li
INTERSPEECH 2022, 2022, : 4252 - 4256
[28] A HMM-based Mandarin Chinese Singing Voice Synthesis System
Li, Xian
Wang, Zengfu
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (02) : 192 - 202
[29] A corpus-based concatenative Mandarin singing voice synthesis system
Zhou, Shu-Sen
Chen, Qing-Cai
Wang, Dan-Dan
Yang, Xiao-Hong
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2695 - 2699
[30] A HMM-based Mandarin Chinese Singing Voice Synthesis System
Xian Li
Zengfu Wang
IEEE/CAAJournalofAutomaticaSinica, 2016, 3 (02) : 192 - 202

← 1 2 3 4 5 →