Singing voice synthesis based on deep neural networks

被引:55
|
作者
Nishimura, Masanari [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
日本科学技术振兴机构;
关键词
Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;
D O I
10.21437/Interspeech.2016-1027
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.
引用
收藏
页码:2478 / 2482
页数:5
相关论文
共 50 条
  • [31] INTEGRATION OF SPEAKER AND PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS
    Shirota, Kanako
    Nakamura, Kazuhiro
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [32] Biosignals learning and synthesis using deep neural networks
    David Belo
    João Rodrigues
    João R. Vaz
    Pedro Pezarat-Correia
    Hugo Gamboa
    BioMedical Engineering OnLine, 16
  • [33] Neural Dynamics of Karaoke-Like Voice Imitation in Singing Performance
    Fruehholz, Sascha
    Trost, Wiebke
    Constantinescu, Irina
    Grandjean, Didier
    FRONTIERS IN HUMAN NEUROSCIENCE, 2020, 14
  • [34] Biosignals learning and synthesis using deep neural networks
    Belo, David
    Rodrigues, Joao
    Vaz, Joao R.
    Pezarat-Correia, Pedro
    Gamboa, Hugo
    BIOMEDICAL ENGINEERING ONLINE, 2017, 16
  • [35] HiddenSinger: High-quality singing voice synthesis via neural audio codec and latent diffusion models
    Hwang, Ji-Sang
    Lee, Sang-Hoon
    Lee, Seong-Whan
    NEURAL NETWORKS, 2025, 181
  • [36] Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation
    Song, Yingjie
    Song, Wei
    Zhang, Wei
    Zhang, Zhengchen
    Zeng, Dan
    Liu, Zhi
    Yu, Yang
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [37] Singing Voice Database
    Tsirulnik, Liliya
    Dubnov, Shlomo
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 501 - 509
  • [38] DeepSinger: Singing Voice Synthesis with Data Mined From the Web
    Ren, Yi
    Tan, Xu
    Qin, Tao
    Luan, Jian
    Zhao, Zhou
    Liu, Tie-Yan
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1979 - 1989
  • [39] Continuous vocoder applied in deep neural network based voice conversion
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33549 - 33572
  • [40] VIBRATO LEARNING IN MULTI-SINGER SINGING VOICE SYNTHESIS
    Liu, Ruolan
    Wen, Xue
    Lu, Chunhui
    Son, Liming
    Sung, June Sig
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 773 - 779