JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

被引:0
|
作者
Cho, Hyunjae [1 ]
Lee, Junhyeok [2 ]
Jung, Wonbin [3 ]
机构
[1] Seoul Natl Univ SNU, Seoul, South Korea
[2] Supertone Inc, Seoul, South Korea
[3] Korea Adv Inst Sci & Technol KAIST, Daejeon, South Korea
来源
INTERSPEECH 2024 | 2024年
关键词
speech synthesis; vocoder; alias-free; GAN; shift-equivariant;
D O I
10.21437/Interspeech.2024-1447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.
引用
收藏
页码:3879 / 3883
页数:5
相关论文
共 50 条
  • [1] Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection
    Chen, Zhehuai
    Rosenberg, Andrew
    Zhang, Yu
    Wang, Gary
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    INTERSPEECH 2020, 2020, : 556 - 560
  • [2] IMPROVING GAN-BASED VOCODER FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS
    He, Mengnan
    Guo, Tingwei
    Lu, Zhengxin
    Zhang, Ruixiong
    Gong, Caixia
    INTERSPEECH 2022, 2022, : 1601 - 1605
  • [3] GaN-Based Light-Emitting Diodes Prepared With Shifted Laser Stealth Dicing
    Chang, Shoou-Jinn Chang
    Chang, L. M.
    Chen, J. Y.
    Hsu, C. S.
    Kuo, D. S.
    Shen, C. F.
    Chen, Wei-Shou
    Ko, T. K.
    JOURNAL OF DISPLAY TECHNOLOGY, 2016, 12 (02): : 195 - 199
  • [4] Compact GaN-based Stacked Cells for 5G Applications at 26 GHz
    Piacibello, A.
    Ramella, C.
    Camarchia, V
    Pirola, M.
    PROCEEDINGS OF THE 2022 21ST MEDITERRANEAN MICROWAVE SYMPOSIUM (MMS 2022), 2022, : 34 - 38
  • [5] Leveraging Statistical Shape Priors in GAN-Based ECG Synthesis
    Neifar, Nour
    Ben-Hamadou, Achraf
    Mdhaffar, Afef
    Jmaiel, Mohamed
    Freisleben, Bernd
    IEEE ACCESS, 2024, 12 : 36002 - 36015
  • [6] Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
    Cong, Jian
    Yang, Shan
    Xie, Lei
    Su, Dan
    INTERSPEECH 2021, 2021, : 2182 - 2186
  • [7] Velocity dispersion in GaN-based surface acoustic wave filters on (0001) sapphire substrates
    Shigekawa, Naoteru
    Nishimura, Kazumi
    Yokoyama, Haruki
    Shiojima, Kenji
    Hohkawa, Kohji
    IEICE ELECTRONICS EXPRESS, 2005, 2 (19): : 495 - 500
  • [8] GaN-based quantum dots
    Li, JW
    Ye, ZZ
    Nasser, NM
    PHYSICA E-LOW-DIMENSIONAL SYSTEMS & NANOSTRUCTURES, 2003, 16 (02) : 244 - 252
  • [9] GaN-based Nanowire Photodetectors
    Gonzalez-Posada, F.
    Songmuang, R.
    Den Hertog, M.
    Monroy, E.
    QUANTUM SENSING AND NANOPHOTONIC DEVICES IX, 2012, 8268
  • [10] Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
    Mizuta, Kazuki
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2021, 2021, : 2192 - 2196