GAN-based Augmentation for Populating Speech Dataset with High Fidelity Synthesized Audio

被引:0
作者
Back, Moon-Ki [1 ]
Yoon, Seung-Won [1 ]
Lee, Kyu-Chul [1 ]
机构
[1] Chungnam Natl Univ, Dept Comp Engn, Daejeon, South Korea
来源
11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020) | 2020年
关键词
Audio augmentation; generative adversarial networks; harmonic percussive separation; progressive growing; speech dataset;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present an audio augmentation method that generates synthetic audio using Generative Adversarial Networks (GANs). We propose a training strategy that first uses Harmonic Percussive Source Separation (HPSS) to extract spectral features and then improves the fidelity of the synthesized audio by applying progressively-growing GANs. Our method is demonstrated on a public speech dataset released by Google TensorFlow. When employing our method, the performance evaluated by Frechet Inception Distance (FID) showed 12.514, but 14.012 as for the existing image-generating GANs (lower FID indicates better fidelity).
引用
收藏
页码:1267 / 1269
页数:3
相关论文
共 43 条
  • [21] GAN-BASED SYNTHETIC MEDICAL IMAGE AUGMENTATION FOR CLASS IMBALANCED DERMOSCOPIC IMAGE ANALYSIS
    Alshardan, Amal
    Alahmari, Saad
    Alghamdi, Mohammed
    AL Sadig, Mutasim
    Mohamed, Abdullah
    Mohammed, Gouse Pasha
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2025,
  • [22] Enhancement of Image Classification Using Transfer Learning and GAN-Based Synthetic Data Augmentation
    Chatterjee, Subhajit
    Hazra, Debapriya
    Byun, Yung-Cheol
    Kim, Yong-Woon
    MATHEMATICS, 2022, 10 (09)
  • [23] PlethAugment: GAN-Based PPG Augmentation for Medical Diagnosis in Low-Resource Settings
    Kiyasseh, Dani
    Tadesse, Girmaw Abebe
    Nhan, Le Nguyen Thanh
    Van Tan, Le
    Thwaites, Louise
    Zhu, Tingting
    Clifton, David
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (11) : 3226 - 3235
  • [24] HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
    Su, Jiaqi
    Jin, Zeyu
    Finkelstein, Adam
    INTERSPEECH 2020, 2020, : 4506 - 4510
  • [25] M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding
    Parcollet, Titouan
    Morchid, Mohamed
    Bost, Xavier
    Linares, Georges
    INTERSPEECH 2019, 2019, : 804 - 808
  • [26] A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification
    Su, Qichen
    Hamed, Haza Nuzly Abdull
    Isa, Mohd Adham
    Hao, Xue
    Dai, Xin
    IEEE ACCESS, 2024, 12 : 16498 - 16513
  • [27] GAN-Based Data Augmentation for AI-Enabled ATP in Free Space Optical Communication
    Liu, Yuchen
    Liu, Yejun
    Song, Song
    Chen, Kun
    Guo, Lei
    IEEE COMMUNICATIONS LETTERS, 2024, 28 (05) : 1067 - 1071
  • [28] Additional Look into GAN-based Augmentation for Deep Learning COVID-19 Image Classification
    Fedoruk O.
    Klimaszewski K.
    Ogonowski A.
    Kruk M.
    Machine Graphics and Vision, 2023, 32 (3-4): : 108 - 124
  • [29] A GAN-Based Multi-Sensor Data Augmentation Technique for CNC Machine Tool Wear Prediction
    Jiang, Yuechi
    Drescher, Benny
    Yuan, Guoguang
    IEEE ACCESS, 2023, 11 : 95782 - 95795
  • [30] Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture
    Kumar, Vikas
    Sharma, Nitin Mohan
    Mahapatra, Prasant K.
    Dogra, Neeti
    Maurya, Lalit
    Ahmad, Fahad
    Dahiya, Neelam
    Panda, Prashant
    DIAGNOSTICS, 2025, 15 (06)