Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System

被引:18
|
作者
Hono, Yukiya [1 ]
Hashimoto, Kei [1 ,2 ]
Oura, Keiichiro [2 ]
Nankaku, Yoshihiko [3 ]
Tokuda, Keiichi [4 ]
机构
[1] Nagoya Inst Technol, Comp Sci, Nagoya, Aichi 4668555, Japan
[2] Nagoya Inst Technol, Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[3] Nagoya Inst Technol, Dept Elect & Elect Engn, Nagoya, Aichi 4668555, Japan
[4] Nagoya Inst Technol, Elect & Elect Engn, Nagoya, Aichi 4668555, Japan
关键词
Acoustics; Hidden Markov models; Feature extraction; Training; Predictive models; Music; Training data; Automatic pitch correction; neural network; singing voice synthesis; timing modeling; vibrato modeling;
D O I
10.1109/TASLP.2021.3104165
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents Sinsy, a deep neural network (DNN)-based singing voice synthesis (SVS) system. In recent years, DNNs have been utilized in statistical parametric SVS systems, and DNN-based SVS systems have demonstrated better performance than conventional hidden Markov model-based ones. SVS systems are required to synthesize a singing voice with pitch and timing that strictly follow a given musical score. Additionally, singing expressions that are not described on the musical score, such as vibrato and timing fluctuations, should be reproduced. The proposed system is composed of four modules: a time-lag model, a duration model, an acoustic model, and a vocoder, and singing voices can be synthesized taking these characteristics of singing voices into account. To better model a singing voice, the proposed system incorporates improved approaches to modeling pitch and vibrato and better training criteria into the acoustic model. In addition, we incorporated PeriodNet, a non-autoregressive neural vocoder with robustness for the pitch, into our systems to generate a high-fidelity singing voice waveform. Moreover, we propose automatic pitch correction techniques for DNN-based SVS to synthesize singing voices with correct pitch even if the training data has out-of-tune phrases. Experimental results show our system can synthesize a singing voice with better timing, more natural vibrato, and correct pitch, and it can achieve better mean opinion scores in subjective evaluation tests.
引用
收藏
页码:2803 / 2815
页数:13
相关论文
共 50 条
  • [41] A Novel Fingerprint Recovery Scheme using Deep Neural Network-based Learning
    Lee, Samuel
    Jang, Seok-Woo
    Kim, Dongho
    Hahn, Hernsoo
    Kim, Gye-Young
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (26-27) : 34121 - 34135
  • [42] Deep Neural Network-based Active Region Magnetogram Patch Super Resolution
    Habeeb, Mohammed Shoebuddin
    Aydin, Berkay
    Ahmadzadeh, Azim
    Georgoulis, Manolis
    Angryk, Rafal A.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4200 - 4207
  • [43] Deep Neural Network-Based Permittivity Inversions for Ground Penetrating Radar Data
    Ji, Yintao
    Zhang, Fengkai
    Wang, Jing
    Wang, Zhengfang
    Jiang, Peng
    Liu, Hanchi
    Sui, Qingmei
    IEEE SENSORS JOURNAL, 2021, 21 (06) : 8172 - 8183
  • [44] A Deep Neural Network-Based Method for Prediction of Dementia Using Big Data
    Kim, Jungyoon
    Lim, Jihye
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (10)
  • [45] Deep Neural Network-Based Precoder for Fairness Aware Secure NOMA Scheme
    Lee, Jinyoung
    Yun, Sangseok
    Kim, Il-Min
    Ha, Jeongseok
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (05) : 5615 - 5620
  • [46] Neural Network-Based Undersampling Techniques
    Arefeen, Md Adnan
    Nimi, Sumaiya Tabassum
    Rahman, M. Sohel
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (02): : 1111 - 1120
  • [47] Neural Kalman Filters for Acoustic Echo Cancellation: Comparison of deep neural network-based extensions [Special Issue On Model-Based and Data-Driven Audio Signal Processing]
    Seidel, Ernst
    Enzner, Gerald
    Mowlaee, Pejman
    Fingscheidt, Tim
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 24 - 38
  • [48] CUBIC-SPLINES NEURAL NETWORK-BASED SYSTEM FOR IMAGE RETRIEVAL
    Sadek, Samy
    Al-Hamadi, Ayoub
    Michaelis, Bernd
    Sayed, Usama
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 273 - +
  • [49] A neural network-based shape control system for cold rolling operations
    Peng, Yan
    Liu, Hongmin
    Duc, R.
    JOURNAL OF MATERIALS PROCESSING TECHNOLOGY, 2008, 202 (1-3) : 54 - 60
  • [50] FGP-GAN: Fine-Grained Perception Integrated Generative Adversarial Network for Expressive Mandarin Singing Voice Synthesis
    Liu, Xin
    Zhang, Weiwei
    Zheng, Zhaohui
    Pan, Mingyang
    Wang, Rong
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (03) : 6054 - 6063