GAN-based Augmentation for Populating Speech Dataset with High Fidelity Synthesized Audio

被引:0
作者
Back, Moon-Ki [1 ]
Yoon, Seung-Won [1 ]
Lee, Kyu-Chul [1 ]
机构
[1] Chungnam Natl Univ, Dept Comp Engn, Daejeon, South Korea
来源
11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020) | 2020年
关键词
Audio augmentation; generative adversarial networks; harmonic percussive separation; progressive growing; speech dataset;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present an audio augmentation method that generates synthetic audio using Generative Adversarial Networks (GANs). We propose a training strategy that first uses Harmonic Percussive Source Separation (HPSS) to extract spectral features and then improves the fidelity of the synthesized audio by applying progressively-growing GANs. Our method is demonstrated on a public speech dataset released by Google TensorFlow. When employing our method, the performance evaluated by Frechet Inception Distance (FID) showed 12.514, but 14.012 as for the existing image-generating GANs (lower FID indicates better fidelity).
引用
收藏
页码:1267 / 1269
页数:3
相关论文
共 43 条
  • [1] GAN-based one dimensional medical data augmentation
    Ye Zhang
    Zhixiang Wang
    Zhen Zhang
    Junzhuo Liu
    Ying Feng
    Leonard Wee
    Andre Dekker
    Qiaosong Chen
    Alberto Traverso
    Soft Computing, 2023, 27 : 10481 - 10491
  • [2] GAN-based one dimensional medical data augmentation
    Zhang, Ye
    Wang, Zhixiang
    Zhang, Zhen
    Liu, Junzhuo
    Feng, Ying
    Wee, Leonard
    Dekker, Andre
    Chen, Qiaosong
    Traverso, Alberto
    SOFT COMPUTING, 2023, 27 (15) : 10481 - 10491
  • [3] GAN-Based Data Augmentation for Visual Finger Spelling Recognition
    Kwolek, Bogdan
    ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041
  • [4] Enhancing human action recognition with GAN-based data augmentation
    Pulakurthi, Prasanna Reddy
    de Melo, Celso M.
    Rao, Raghuveer
    Rabbani, Majid
    SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [5] EnvGAN: a GAN-based augmentation to improve environmental sound classification
    Madhu, Aswathy
    Suresh, K.
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 6301 - 6320
  • [6] GAN-based Data Generation for Speech Emotion Recognition
    Eskimez, Sefik Emre
    Dimitriadis, Dimitrios
    Gmyr, Robert
    Kumanati, Kenichi
    INTERSPEECH 2020, 2020, : 3446 - 3450
  • [7] GAN-Based Data Augmentation For Improving The Classification Of EEG Signals
    Bhat, Sudhanva
    Hortal, Enrique
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 453 - 458
  • [8] FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
    Shen, Rubing
    Ren, Yanzhen
    Sung, Zongkun
    INTERSPEECH 2024, 2024, : 3884 - 3888
  • [9] A NOVEL GAN-BASED DATA AUGMENTATION ALGORITHM FOR SEMICONDUCTOR DEFECT INSPECTION
    Liu, Yang
    Guan, Yuanjun
    Han, Tianyan
    Ma, Can
    Wang, Jiayi
    Wang, Tao
    Yi, Qianchuan
    Hu, Lilei
    CONFERENCE OF SCIENCE & TECHNOLOGY FOR INTEGRATED CIRCUITS, 2024 CSTIC, 2024,
  • [10] A Multi-Resolution Approach to GAN-Based Speech Enhancement
    Kim, Hyung Yong
    Yoon, Ji Won
    Cheon, Sung Jun
    Kang, Woo Hyun
    Kim, Nam Soo
    APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 15