Singing voice synthesis based on deep neural networks

被引:55
|
作者
Nishimura, Masanari [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
日本科学技术振兴机构;
关键词
Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;
D O I
10.21437/Interspeech.2016-1027
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.
引用
收藏
页码:2478 / 2482
页数:5
相关论文
共 50 条
  • [41] LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS
    Zhuang, Xiaobin
    Jiang, Tao
    Chou, Szu-Yu
    Wu, Bin
    Hu, Peng
    Lui, Simon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7078 - 7082
  • [42] Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training
    Chen, Ling-Hui
    Ling, Zhen-Hua
    Liu, Li-Juan
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1859 - 1872
  • [43] LEARN2SING: TARGET SPEAKER SINGING VOICE SYNTHESIS BY LEARNING FROM A SINGING TEACHER
    Xue, Heyang
    Yang, Shan
    Lei, Yi
    Xie, Lei
    Li, Xiulin
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 522 - 529
  • [44] Speaking Rate Estimation Based on Deep Neural Networks
    Tomashenko, Natalia
    Khokhlov, Yuri
    SPEECH AND COMPUTER, 2014, 8773 : 418 - 424
  • [45] Skin lesion detection based on deep neural networks
    Choudhary, Priya
    Singhai, Jyoti
    Yadav, J. S.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 230
  • [46] WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN
    Chandna, Pritish
    Blaauw, Merlijn
    Bonada, Jordi
    Gomez, Emilia
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [47] WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
    Zhang, Zewang
    Zheng, Yibin
    Li, Xinhui
    Lu, Li
    INTERSPEECH 2022, 2022, : 4252 - 4256
  • [48] MLP SINGER: TOWARDS RAPID PARALLEL KOREAN SINGING VOICE SYNTHESIS
    Tae, Jaesung
    Kim, Hyeongju
    Lee, Younggun
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [49] DSUSING: Dual Scale U-Nets for Singing Voice Synthesis
    Park, Hyunju
    Woo, Jihwan
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 201 - 206
  • [50] XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
    Lu, Peiling
    Wu, Jie
    Luan, Jian
    Tan, Xu
    Zhou, Li
    INTERSPEECH 2020, 2020, : 1306 - 1310