LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引：8

作者：

Zhuang, Xiaobin ^{[1
]}

Jiang, Tao ^{[1
]}

Chou, Szu-Yu ^{[1
]}

Wu, Bin ^{[1
]}

Hu, Peng ^{[1
]}

Lui, Simon ^{[1
]}

机构：

[1] Tencent Mus Entertainment, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;

D O I：

10.1109/ICASSP39728.2021.9414043

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.

引用

页码：7078 / 7082

页数：5

共 50 条

[31] XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
Lu, Peiling
Wu, Jie
Luan, Jian
Tan, Xu
Zhou, Li
INTERSPEECH 2020, 2020, : 1306 - 1310
[32] Mandarin Singing-voice Synthesis Using an HNM Based Scheme
Gu, Hung-Yan
Liao, Huang-Liang
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (01) : 303 - 317
[33] PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
Oura, Keiichiro
Mase, Ayami
Nankaku, Yoshihiko
Tokuda, Keiichi
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5377 - 5380
[34] KaraTuner: Towards End-to-End Natural Pitch Correction for Singing Voice in Karaoke
Zhuang, Xiaobin
Yu, Huiran
Zhao, Weifeng
Jiang, Tao
Hu, Peng
INTERSPEECH 2022, 2022, : 4262 - 4266
[35] Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Shi, Jiatong
Guo, Shuai
Qian, Tao
Huo, Nan
Hayashi, Tomoki
Wu, Yuning
Xu, Frank
Chang, Xuankai
Li, Huazhe
Wu, Peter
Watanabe, Shinji
Jin, Qin
INTERSPEECH 2022, 2022, : 4277 - 4281
[36] Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
Hono, Yukiya
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2803 - 2815
[37] DNN-BASED ENSEMBLE SINGING VOICE SYNTHESIS WITH INTERACTIONS BETWEEN SINGERS
Hyodo, Hiroaki
Takamichi, Shinnosuke
Nakamura, Tomohiko
Koguchi, Junya
Saruwatari, Hiroshi
2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 660 - 667
[38] Karaoker: Alignment-free singing voice synthesis with speech training data
Kakoulidis, Panos
Ellinas, Nikolaos
Vamvoukakis, Georgios
Markopoulos, Konstantinos
Sung, June Sig
Jho, Gunu
Tsiakoulis, Pirros
Chalamandaris, Aimilios
INTERSPEECH 2022, 2022, : 2993 - 2997
[39] Rhythm Speech Lyrics Input for MIDI-Based Singing Voice Synthesis
Lee, Hong-Ru
Huang, Chih-Fang
Hsu, Chih-Hao
Wang, Wen-Nan
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 459 - +
[40] Korean Singing Voice Synthesis System based on an LSTM Recurrent Neural Network
Kim, Juntae
Choi, Heejin
Park, Jinuk
Hahn, Minsoo
Kim, Sangjin
Kim, Jong-Jin
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1551 - 1555

← 1 2 3 4 5 →