JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

被引:0
|
作者
Cho, Hyunjae [1 ]
Lee, Junhyeok [2 ]
Jung, Wonbin [3 ]
机构
[1] Seoul Natl Univ SNU, Seoul, South Korea
[2] Supertone Inc, Seoul, South Korea
[3] Korea Adv Inst Sci & Technol KAIST, Daejeon, South Korea
来源
INTERSPEECH 2024 | 2024年
关键词
speech synthesis; vocoder; alias-free; GAN; shift-equivariant;
D O I
10.21437/Interspeech.2024-1447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.
引用
收藏
页码:3879 / 3883
页数:5
相关论文
共 50 条
  • [41] Disentanglement in a GAN for Unconditional Speech Synthesis
    Baas, Matthew
    Kamper, Herman
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1324 - 1335
  • [42] GaN-based electroluminescence device with AC operation using GaN powder
    Honda, T
    Maki, K
    Kawanishi, H
    PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON NITRIDE SEMICONDUCTORS, 2000, 1 : 644 - 646
  • [43] GAN-BASED DOMAIN ADAPTATION FOR OBJECT CLASSIFICATION
    Bejiga, Mesay Belete
    Melgani, Farid
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 1264 - 1267
  • [44] HiFi-GAN based Text-to-Speech Synthesis in Serbian
    Suzic, Sinisa
    Pekar, Darko
    Secujski, Milan
    Nosek, Tijana
    Delic, Vlado
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 2231 - 2235
  • [45] HiFi-GAN based Text-to-Speech Synthesis in Serbian
    Suzic, Sinisa
    Pekar, Darko
    Secujski, Milan
    Nosek, Tijana
    Delic, Vlado
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1178 - 1182
  • [46] GaN-based substrates and optoelectronic materials and devices
    Zhang, Guoyi
    Shen, Bo
    Chen, Zhizhong
    Hu, Xiaodong
    Qin, Zhixin
    Wang, Xinqiang
    Wu, Jiejun
    Yu, Tongjun
    Kang, Xiangning
    Fu, Xingxing
    Yang, Wei
    Yang, Zhijian
    Gan, Zhizhao
    CHINESE SCIENCE BULLETIN, 2014, 59 (12): : 1201 - 1218
  • [47] A GAN-Based Face Rotation for Artistic Portraits
    Kim, Handong
    Kim, Junho
    Yang, Heekyung
    MATHEMATICS, 2022, 10 (20)
  • [48] GaN-based SSD structure for THz applications
    Agrawal, Manvi
    Nethaji, Dharmarasu
    Radhakrishnan, K.
    Gaquiere, Christophe
    Ducournau, Guillaume
    Lesecq, Marie
    Mateos, Javier
    Gonzalez, Tomas
    Iniguez-de-la-Torre, Ignacio
    Garcia, Sergio
    Perez, Susana
    PROCEEDINGS OF THE 2019 IEEE ASIA-PACIFIC MICROWAVE CONFERENCE (APMC), 2019, : 213 - 215
  • [49] GaN-based blue/green semiconductor laser
    Nakamura, S
    IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 1997, 3 (02) : 435 - 442
  • [50] GaN on patterned silicon (GPS) technique for fabrication of GaN-based MEMS
    Yang, ZC
    Wang, RA
    Jia, S
    Wang, DL
    Zhang, BS
    Chen, KJ
    Lau, KM
    TRANSDUCERS '05, DIGEST OF TECHNICAL PAPERS, VOLS 1 AND 2, 2005, : 887 - 890