Singing voice synthesis based on deep neural networks

被引:55
|
作者
Nishimura, Masanari [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
日本科学技术振兴机构;
关键词
Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;
D O I
10.21437/Interspeech.2016-1027
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.
引用
收藏
页码:2478 / 2482
页数:5
相关论文
共 50 条
  • [1] Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2803 - 2815
  • [2] Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling
    Yi, Yuan-Hao
    Ai, Yang
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2019, 2019, : 2593 - 2597
  • [3] SINGING VOICE SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6955 - 6959
  • [4] FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS
    Nakamura, Kazuhiro
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7239 - 7243
  • [5] An HMM-based Singing Voice Synthesis System
    Saino, Keijiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2274 - 2277
  • [6] Korean Singing Voice Synthesis System based on an LSTM Recurrent Neural Network
    Kim, Juntae
    Choi, Heejin
    Park, Jinuk
    Hahn, Minsoo
    Kim, Sangjin
    Kim, Jong-Jin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1551 - 1555
  • [7] TEMPLATE-BASED PERSONALIZED SINGING VOICE SYNTHESIS
    Cen, Ling
    Dong, Minghui
    Chan, Paul
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4509 - 4512
  • [8] VOICE SOURCE MODELLING USING DEEP NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Raitio, Tuomo
    Lu, Heng
    Kane, John
    Suni, Antti
    Vainio, Martti
    King, Simon
    Alku, Paavo
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2290 - 2294
  • [9] SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System
    Zhao, Junchuan
    Chetwin, Low Qi Hong
    Wang, Ye
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2641 - 2653
  • [10] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Li, Xian
    Wang, Zengfu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (02) : 192 - 202