An intra-class distribution-focused generative adversarial network approach for imbalanced tabular data learning

被引:1
作者
Chen, Qiuling [1 ,2 ]
Ye, Ayong [1 ,2 ]
Zhang, Yuexin [1 ,2 ]
Chen, Jianwei [1 ,2 ]
Huang, Chuan [1 ,2 ]
机构
[1] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350007, Peoples R China
[2] Fujian Prov Key Lab Network Secur & Cryptol, Fuzhou 350007, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data learning; Oversampling; Clustering; Generative adversarial network; HYBRID APPROACH; SMOTE;
D O I
10.1007/s13042-023-02048-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data imbalance is a critical factor that adversely affects the performance of machine learning algorithms. It leads to deviations in decision boundaries, resulting in biased predictions towards the majority class and inaccurate classification of the minority class. Although oversampling the minority class using deep generative models is a popular strategy, many existing methods focus solely on enhancing data for the minority class while overlooking the distribution relationship within and between classes. Therefore, we propose an oversampling method that merges unsupervised clustering and generative adversarial network (GAN) to facilitate the imbalanced tabular data learning. First, we perform preprocessing (clustering) on the original data, remove clusters that do not require sampling and generate more samples for sparsely distributed minority class clusters to achieve sample balance within the minority class. Moreover, we design a CTGAN-based auxiliary classifier GAN (ACCTGAN) to generate the minority class. It enhances the semantic integrity of the synthetic data and avoids generating noisy samples. We conducted validation experiments comparing our approach to 7 typical methods on 12 real tabular datasets. Our method shows excellent performance in F1-measure and area under the curve (AUC), obtaining 19 and 20 best results on the three classifiers, respectively. It significantly enhances classification results and demonstrates good robustness and stability.
引用
收藏
页码:2551 / 2572
页数:22
相关论文
共 47 条
  • [1] A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem
    An, Chunsheng
    Sun, Jingtong
    Wang, Yifeng
    Wei, Qingjie
    [J]. 2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 883 - 887
  • [2] Batista G. E., 2004, ACM SIGKDD Explor. Newsl., V6, P20, DOI [10.1145/1007730.1007735, 10.1145/1007730.1007735.2]
  • [3] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    [J]. MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [4] Chawla N. V., 2004, ACM SIGKDD Explorations Newsletter, V6, P1, DOI [10.1145/1007730.1007733, DOI 10.1145/1007730.1007733]
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] Efficient Generative Adversarial Networks for Imbalanced Traffic Collision Datasets
    Chen, Mu-Yen
    Chiang, Hsiu-Sen
    Huang, Wei-Kai
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 19864 - 19873
  • [7] Class-Imbalanced Deep Learning via a Class-Balanced Ensemble
    Chen, Zhi
    Duan, Jiang
    Kang, Li
    Qiu, Guoping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5626 - 5640
  • [8] OPTIMAL ADAPTIVE K-MEANS ALGORITHM WITH DYNAMIC ADJUSTMENT OF LEARNING RATE
    CHINRUNGRUENG, C
    SEQUIN, CH
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01): : 157 - 169
  • [9] Class-overlap undersampling based on Schur decomposition for Class-imbalance problems
    Dai, Qi
    Liu, Jian-wei
    Shi, Yong-hui
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [10] RVGAN-TL: A generative adversarial networks and transfer learning-based hybrid approach for imbalanced data classification
    Ding, Hongwei
    Sun, Yu
    Huang, Nana
    Shen, Zhidong
    Wang, Zhenyu
    Iftekhar, Adnan
    Cui, Xiaohui
    [J]. INFORMATION SCIENCES, 2023, 629 (184-203) : 184 - 203