Singing voice synthesis based on deep neural networks

被引:55
|
作者
Nishimura, Masanari [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
日本科学技术振兴机构;
关键词
Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;
D O I
10.21437/Interspeech.2016-1027
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.
引用
收藏
页码:2478 / 2482
页数:5
相关论文
共 50 条
  • [21] Multi-Voice Singing Synthesis From Lyrics
    S. Resna
    Rajeev Rajan
    Circuits, Systems, and Signal Processing, 2023, 42 : 307 - 321
  • [22] Multi-Voice Singing Synthesis From Lyrics
    Resna, S.
    Rajan, Rajeev
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 307 - 321
  • [23] A Lyrics to Singing Voice Synthesis System with Variable Timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    APPLIED INFORMATICS AND COMMUNICATION, PT 2, 2011, 225 : 186 - +
  • [24] A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS
    Choi, Soonbeom
    Nam, Juhan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7242 - 7246
  • [25] SUSing: SU-net for Singing Voice Synthesis
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [26] A Lyrics to Singing Voice Synthesis system with variable timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL II, 2010, : 109 - 112
  • [27] Enhancing Voice Activity Detection in Noisy Environments Using Deep Neural Networks
    Nagaraja, B. G.
    Yadava, G. Thimmaraja
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [28] Factored Maximum Likelihood Kernelized Regression for HMM-based Singing Voice Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Koo, Hyun Woo
    Kim, Nam Soo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 359 - 363
  • [29] KOREAN SINGING VOICE SYNTHESIS BASED ON AUTO-REGRESSIVE BOUNDARY EQUILIBRIUM GAN
    Choi, Soonbeom
    Kim, Wonil
    Park, Saebyul
    Yong, Sangeon
    Nam, Juhan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7234 - 7238
  • [30] Voice conversion based on trajectory model training of neural networks considering global variance
    Hosaka, Naoki
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 307 - 311