Data Augmentation Generated by Generative Adversarial Network for Small Sample Datasets Clustering

被引:7
作者
Yu, Hui [1 ]
Wang, Qiao Feng [1 ]
Shi, Jian Yu [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Life Sci, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Data mining; Clustering; Data augmentation; Generation adversarial network; Small sample dataset; INTELLIGENCE;
D O I
10.1007/s11063-023-11315-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data mining, the performance of clustering is largely affected by the number of samples. However, obtaining enough data samples in some applications is difficult and expensive. To solve this problem, data augmentation like the oversampling methods have been adopted, but these methods mainly focus more on the local information of the data, without considering its potential distribution. In this paper, a new data augmentation method is proposed, which is the Wasserstein Generation Adversarial Network based on the Gaussian Mixture Model (GMM_WGAN) to generate datasets for small samples, to solve the problem of insufficient dataset size in clustering. It includes two steps, in the first step we use the Gaussian Mixture Model to capture the potential distribution of the real dataset, and in the second step, we use Wasserstein generative adversarial network to generate data samples to expand the small size dataset. We utilize five clustering algorithms to evaluate GMM_WGAN performance and compare it with the other seven data enhancement methods. Experiments on 10 small size datasets demonstrate that the proposed approach achieves greater result than others based on five evaluation metrics.
引用
收藏
页码:8365 / 8384
页数:20
相关论文
共 46 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[3]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[4]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[5]   Discriminative K-Means Laplacian Clustering [J].
Chao, Guoqing .
NEURAL PROCESSING LETTERS, 2019, 49 (01) :393-405
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   Intelligent fault diagnosis of rotating components in the absence of fault data: A transfer-based approach [J].
Deng, Minqiang ;
Deng, Aidong ;
Zhu, Jing ;
Shi, Yaowei ;
Liu, Yang .
MEASUREMENT, 2021, 173
[8]  
Ester M., 1996, KDD 96, P226, DOI DOI 10.5555/3001460.3001507
[9]   A Novel Automatic Classification Method Based on the Hybrid Lightweight Shunt Network for Sintered Surfaces [J].
Fan, Jiawang ;
Liu, Min ;
Wang, Xuan ;
Wang, Jiazheng ;
Wen, He ;
Wang, Yaonan .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[10]   Fast agglomerative clustering using a k-nearest neighbor graph [J].
Franti, Pasi ;
Virmajoki, Olli ;
Hautamaki, Ville .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (11) :1875-1881