High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks

被引:2
|
作者
Yoneyama, Reo [1 ]
Wu, Yi-Chiao [1 ]
Toda, Tomoki [2 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya 4648601, Japan
[2] Nagoya Univ, Informat Technol Ctr, Nagoya 4648601, Japan
基金
日本学术振兴会;
关键词
Vocoders; Controllability; Speech processing; Neural networks; Training; Mathematical models; Acoustics; Speech synthesis; neural vocoder; source-filter model; unified source-filter networks; WAVE-FORM GENERATION; SPEECH SYNTHESIS; MODEL;
D O I
10.1109/TASLP.2023.3313410
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce unified source-filter generative adversarial networks (uSFGAN), a waveform generative model conditioned on acoustic features, which represents the source-filter architecture in a generator network. Unlike the previous neural-based source-filter models in which parametric signal process modules are combined with neural networks, our approach enables unified optimization of both the source excitation generation and resonance filtering parts to achieve higher sound quality. In the uSFGAN framework, several specific regularization losses are proposed to enable the source excitation generation part to output reasonable source excitation signals. Both objective and subjective experiments are conducted, and the results demonstrate that the proposed uSFGAN achieves comparable sound quality to HiFi-GAN in the speech reconstruction task and outperforms WORLD in the F-0 transformation task. Moreover, we argue that the F-0-driven mechanism and the inductive bias obtained by source-filter modeling improve the robustness against unseen F-0 in training as shown by the results of experimental evaluations. Audio samples are available at our demo site at https://chomeyama.github.io/PitchControllableNeuralVocoder-Demo/.
引用
收藏
页码:3717 / 3729
页数:13
相关论文
共 11 条
  • [1] A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules
    Yang, Runxuan
    Peng, Yuyang
    Hu, Xiaolin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3362 - 3373
  • [2] Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN
    Yoneyama, Reo
    Wu, Yi-Chiao
    Toda, Tomoki
    INTERSPEECH 2021, 2021, : 2187 - 2191
  • [3] Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
    Lu, Ye-Xin
    Ai, Yang
    Ling, Zhen-Hua
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 68 - 80
  • [4] Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
    Song, Kun
    Cong, Jian
    Wang, Xinsheng
    Zhang, Yongmao
    Xie, Lei
    Jiang, Ning
    Wu, Haiying
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 71 - 75
  • [5] Reverberation Modeling for Source-Filter-based Neural Vocoder
    Ai, Yang
    Wang, Xin
    Yamagishi, Junichi
    Ling, Zhen-Hua
    INTERSPEECH 2020, 2020, : 3560 - 3564
  • [6] High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling
    Tobing, Patrick Lumban
    Toda, Tomoki
    INTERSPEECH 2021, 2021, : 2217 - 2221
  • [7] FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
    Shen, Rubing
    Ren, Yanzhen
    Sung, Zongkun
    INTERSPEECH 2024, 2024, : 3884 - 3888
  • [8] Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
    Yoon, Hyun-Wook
    Lee, Sang-Hoon
    Noh, Hyeong-Rae
    Lee, Seong-Whan
    INTERSPEECH 2020, 2020, : 3545 - 3549
  • [9] Spectral prediction method based on the transformer neural network for high-fidelity color reproduction
    Li, Huailin
    Zheng, Yingying
    Liu, Qinsen
    Sun, Bangyong
    OPTICS EXPRESS, 2024, 32 (17): : 30481 - 30499
  • [10] Convolutional Neural Network Based Denoising for Digital Image Correlation Reconstructing High-Fidelity Deformation Field
    Niu, Bangyan
    Ji, Jingjing
    2023 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, AIM, 2023, : 727 - 732