LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS

被引：8

作者：

Zhuang, Xiaobin ^{[1
]}

Jiang, Tao ^{[1
]}

Chou, Szu-Yu ^{[1
]}

Wu, Bin ^{[1
]}

Hu, Peng ^{[1
]}

Lui, Simon ^{[1
]}

机构：

[1] Tencent Mus Entertainment, Shenzhen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

singing voice synthesis; non-autoregressive model; generative adversarial network; lightweight; expressive;

D O I：

10.1109/ICASSP39728.2021.9414043

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are proven related to the expressiveness. We predict the pitch and the timbre features separately, avoiding the interdependence between these two features. Instead of neural network vocoders, a parametric WORLD vocoder is employed for the pitch curve consistency. Experiment results show that LiteSing outperforms the baseline model using feed-forward Transformer by 1.386 times faster on inference speed, 15 times smaller on training parameters number, and achieves a similar MOS on sound quality. Through an A/B test, LiteSing achieves 67.3% preference rate over baseline in pitch curve and dynamic spectrogram energy prediction. which demonstrates the advantage of LiteSing over the other compared models.

引用

页码：7078 / 7082

页数：5

共 50 条

[41] Exploring Cross-lingual Singing Voice Synthesis Using Speech Data
Cao, Yuewen
Liu, Songxiang
Kang, Shiyin
Hu, Na
Liu, Peng
Liu, Xunying
Su, Dan
Yu, Dong
Meng, Helen
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[42] Adversarially Trained End-to-end Korean Singing Voice Synthesis System
Lee, Juheon
Choi, Hyeong-Seok
Jeon, Chang-Bin
Koo, Junghyun
Lee, Kyogu
INTERSPEECH 2019, 2019, : 2588 - 2592
[43] Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling
Yi, Yuan-Hao
Ai, Yang
Ling, Zhen-Hua
Dai, Li-Rong
INTERSPEECH 2019, 2019, : 2593 - 2597
[44] MELLOTRON: MULTISPEAKER EXPRESSIVE VOICE SYNTHESIS BY CONDITIONING ON RHYTHM, PITCH AND GLOBAL STYLE TOKENS
Valle, Rafael
Li, Jason
Prenger, Ryan
Catanzaro, Bryan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6189 - 6193
[45] Factored Maximum Likelihood Kernelized Regression for HMM-based Singing Voice Synthesis
Sung, June Sig
Hong, Doo Hwa
Koo, Hyun Woo
Kim, Nam Soo
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 359 - 363
[46] KOREAN SINGING VOICE SYNTHESIS BASED ON AUTO-REGRESSIVE BOUNDARY EQUILIBRIUM GAN
Choi, Soonbeom
Kim, Wonil
Park, Saebyul
Yong, Sangeon
Nam, Juhan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7234 - 7238
[47] VISINGER: VARIATIONAL INFERENCE WITH ADVERSARIAL LEARNING FOR END-TO-END SINGING VOICE SYNTHESIS
Zhang, Yongmao
Cong, Jian
Xue, Heyang
Xie, Lei
Zhu, Pengcheng
Bi, Mengxiao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7237 - 7241
[48] CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Ye, Zhen
Xue, Wei
Tan, Xu
Chen, Jie
Liu, Qifeng
Guo, Yike
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1831 - 1839
[49] Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Kim, Tae-Woo
Kang, Min-Su
Lee, Gyeong-Hoon
INTERSPEECH 2022, 2022, : 3008 - 3012
[50] INTEGRATION OF SPEAKER AND PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
Shirota, Kanako
Nakamura, Kazuhiro
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →