EnvGAN: a GAN-based augmentation to improve environmental sound classification

被引：0

作者：

Aswathy Madhu

Suresh K.

机构：

[1] APJ Abdul Kalam Technological University,Department of Electronics, College of Engineering

[2] APJ Abdul Kalam Technological University,Department of Electronics, Govt. Engineering College

来源：

Artificial Intelligence Review | 2022年 / 55卷

关键词：

Generative adversarial network; Environmental sound classification; Data augmentation; Convolutional neural network; Deep learning; UrbanSound8K; ESC-10;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Several deep learning algorithms have emerged for the automatic classification of environmental sounds. However, the non-availability of adequate labeled data for training limits the performance of these algorithms. Data augmentation is an appropriate solution to this problem. Generative Adversarial Networks (GANs) can successfully generate synthetic speech and sounds of musical instruments for classification applications. In this paper, we present a method for GAN-based augmentation in the context of environmental sound classification. We introduce an architecture named EnvGAN for the adversarial generation of environmental sounds. To validate the quality of the generated sounds, we have conducted subjective and objective evaluations. The results indicate that EnvGAN can produce samples of various domains with an acceptable target quality. We applied this augmentation technique on three benchmark ESC datasets (ESC-10, UrbanSound8K, and TUT Urban Acoustic Scenes development dataset) and used it for training a CNN-based classifier. Experimental results show that this new augmentation method can outperform a baseline method with no augmentation by a relatively wide margin (10–12% on ESC-10, 5–7% on UrbanSound8K, and 4–5% on TUT). In particular, the GAN-based approach reduces the confusion between all pairs of classes on UrbanSound8K. That is, the proposed method is especially suitable for handling class-imbalanced datasets.

引用

页码：6301 / 6320

页数：19

共 77 条

[1]

Ali-Gombe A(2019)MFC-GAN: class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network Neurocomputing 361 212-221

[2]

Elyan E(2017)Classifying environmental sounds using image recognition networks Procedia Comput Sci 112 2048-2056

[3]

Boddapati V(2009)Environmental sound recognition with time-frequency audio features IEEE Trans Audio Speech Lang Process 17 1142-1158

[4]

Petef A(2007)Audio-visual event recognition in surveillance video sequences IEEE Trans Multimed 9 257-267

[5]

Rasmusson J(2015)Data augmentation for deep neural network acoustic modeling IEEE/ACM Trans Audio Speech Lang Process 23 1469-1477

[6]

Lundberg L(2020)A new deep CNN model for environmental sound classification IEEE Access 42 637-661

[7]

Chu SM(2014)A survey of tagging techniques for music, speech and environmental sound Artif Intell Rev 27 2672-2680

[8]

Narayanan SS(2014)Generative adversarial nets Adv Neural Inf Process Syst 69 839-855

[9]

Kuo CCJ(2007)Similarity and categorization of environmental sounds Percept Psychophys 86 2278-2324

[10]

Cristani M(2021)A multi-resolution approach to gan-based speech enhancement Appl Sci 22 276-282

← 1 2 3 4 5 6 7 8 →