Unsupervised feature learning for environmental sound classification using Weighted Cycle-Consistent Generative Adversarial Network

被引:36
作者
Esmaeilpour, Mohammad [1 ]
Cardinal, Patrick [1 ]
Koerich, Alessandro Lameiras [1 ]
机构
[1] Univ Quebec, ETS, 1100 Notre Dame West, Montreal, PQ H3C 1K3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Environmental sound classification; Generative Adversarial Network (GAN); Cycle-Consistent GAN; K-means plus; Random forests; QUALITY ASSESSMENT; AUDIO; RECOGNITION;
D O I
10.1016/j.asoc.2019.105912
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we propose a novel environmental sound classification approach incorporating unsupervised feature learning via the spherical K-Means++ algorithm and a new architecture for high-level data augmentation. The audio signal is transformed into a 2D representation using a discrete wavelet transform (DWT). The DWT spectrograms are then augmented by a novel architecture for cycle-consistent generative adversarial network. This high-level augmentation bootstraps generated spectrograms in both intra-and inter-class manners by translating structural features from sample to sample. A codebook is built by coding the DWT spectrograms with the speeded-up robust feature detector and the K-Means++ algorithm. The Random forest is the final learning algorithm which learns the environmental sound classification task from the code vectors. Experimental results in four benchmarking environmental sound datasets (ESC-10, ESC-50, UrbanSound8k, and DCASE-2017) have shown that the proposed classification approach outperforms most of the state-of-the-art classifiers, including convolutional neural networks such as AlexNet and GoogLeNet, improving the classification rate between 3.51% and 14.34%, depending on the dataset. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 81 条
[1]   End-to-end environmental sound classification using a 1D convolutional neural network [J].
Abdoli, Sajjad ;
Cardinal, Patrick ;
Koerich, Alessandro Lameiras .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 136 :252-263
[2]  
[Anonymous], WORKSH DET CLASS AC
[3]  
[Anonymous], 7 INT C SPOKEN LANGU
[4]  
[Anonymous], 2011, arXiv preprint arXiv:1102.0183
[5]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[6]  
[Anonymous], 2013, 14th International Society for Music Information Retrieval Conference (ISMIR-2013)
[7]  
[Anonymous], 2015, 16 INT SOC MUS INF R
[8]  
[Anonymous], 2006, Digital Image Processing
[9]  
[Anonymous], 2019, ARXIV190411649
[10]  
[Anonymous], 2008, COMPUT VIS IMAGE UND