WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

被引:52
|
作者
Chandna, Pritish [1 ]
Blaauw, Merlijn [1 ]
Bonada, Jordi [1 ]
Gomez, Emilia [1 ,2 ]
机构
[1] Univ Pompeu Fabra, Mus Technol Grp, Barcelona, Spain
[2] European Commiss, Joint Res Ctr, Seville, Spain
来源
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年
基金
欧盟地平线“2020”;
关键词
Wasserstein-GAN; DCGAN; WORLD vocoder; Singing Voice Synthesis; Block-wise Predictions;
D O I
10.23919/eusipco.2019.8903099
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer identity as input and outputs vocoder features, corresponding to the block of features. This block-wise approach, along with the training methodology allows us to model temporal dependencies within the features of the input block. For inference, sequential blocks are concatenated using an overlap-add procedure. We show that the performance of our model is competitive with regards to the state-of-the-art and the original sample using objective metrics and a subjective listening test. We also present examples of the synthesis on a supplementary website and the source code via GitHub.
引用
收藏
页数:5
相关论文
共 34 条
  • [1] Multi-Voice Singing Synthesis From Lyrics
    S. Resna
    Rajeev Rajan
    Circuits, Systems, and Signal Processing, 2023, 42 : 307 - 321
  • [2] Multi-Voice Singing Synthesis From Lyrics
    Resna, S.
    Rajan, Rajeev
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 307 - 321
  • [3] MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer That Controls Emotional Intensity
    Kim, Sungjae
    Kim, Yewon
    Jun, Jewoo
    Kim, Injung
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2751 - 2764
  • [4] Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
    Wang, Chunhui
    Zeng, Chang
    He, Xing
    INTERSPEECH 2023, 2023, : 5401 - 5405
  • [5] KOREAN SINGING VOICE SYNTHESIS BASED ON AUTO-REGRESSIVE BOUNDARY EQUILIBRIUM GAN
    Choi, Soonbeom
    Kim, Wonil
    Park, Saebyul
    Yong, Sangeon
    Nam, Juhan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7234 - 7238
  • [6] VIBRATO LEARNING IN MULTI-SINGER SINGING VOICE SYNTHESIS
    Liu, Ruolan
    Wen, Xue
    Lu, Chunhui
    Son, Liming
    Sung, June Sig
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 773 - 779
  • [7] Singing voice synthesis based on deep neural networks
    Nishimura, Masanari
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2478 - 2482
  • [8] TEMPLATE-BASED PERSONALIZED SINGING VOICE SYNTHESIS
    Cen, Ling
    Dong, Minghui
    Chan, Paul
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4509 - 4512
  • [9] An HMM-based Singing Voice Synthesis System
    Saino, Keijiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2274 - 2277
  • [10] SINGING VOICE SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6955 - 6959