Learning From Imbalanced Data Using Triplet Adversarial Samples

被引:0
作者
Yun, Jaesub [1 ]
Lee, Jong-Seok [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Data generation; Synthetic data; Data models; Optimization; Neural networks; Deep learning; Adaptation models; Class imbalance; triplet loss; adversarial samples; synthetic data generation; multiple classes; area under ROC; AREA; SMOTE;
D O I
10.1109/ACCESS.2023.3262604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The imbalance of classes in real-world datasets poses a major challenge in machine learning and classification, and traditional synthetic data generation methods often fail to address this problem effectively. A major limitation of these methods is that they tend to separate the process of generating synthetic samples from the training process, resulting in synthetic data that lack the necessary informative characteristics for proper model training. We present a new synthetic data generation method that addresses this issue by combining adversarial sample generation with a triplet loss method. This approach focuses on increasing the diversity in the minority class while preserving the integrity of the decision boundary. Furthermore, we show that reducing triplet loss is equivalent to maximizing the area under the receiver operating characteristic curve under specific conditions, providing a theoretical basis for the effectiveness of our method. In addition, we present a model training approach to further improve the generalization of the model to small classes by providing a diverse set of synthetic samples optimized using our proposed loss function. We evaluated our method on several imbalanced benchmark tasks and compared it to state-of-the-art techniques, demonstrating that our method can deliver even better performance, making it an effective solution to the class imbalance problem.
引用
收藏
页码:31467 / 31478
页数:12
相关论文
共 55 条
  • [1] Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
  • [2] Evaluation of SMOTE for high-dimensional class-imbalanced microarray data
    Blagus, Rok
    Lusa, Lara
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 89 - 94
  • [3] A systematic study of the class imbalance problem in convolutional neural networks
    Buda, Mateusz
    Maki, Atsuto
    Mazurowski, Maciej A.
    [J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
  • [4] Cao KD, 2019, Arxiv, DOI arXiv:1906.07413
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification
    Chen, Kanghao
    Lei, Weixian
    Zhao, Shen
    Zheng, Wei-Shi
    Wang, Ruixuan
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (04) : 2026 - 2036
  • [7] Ranking and empirical minimization of U-statistics
    Clemencon, Stephan
    Lugosi, Gabor
    Vayatis, Nicolas
    [J]. ANNALS OF STATISTICS, 2008, 36 (02) : 844 - 874
  • [8] Randaugment: Practical automated data augmentation with a reduced search space
    Cubuk, Ekin D.
    Zoph, Barret
    Shlens, Jonathon
    Le, Quoc, V
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3008 - 3017
  • [9] Class-Balanced Loss Based on Effective Number of Samples
    Cui, Yin
    Jia, Menglin
    Lin, Tsung-Yi
    Song, Yang
    Belongie, Serge
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9260 - 9269
  • [10] Deng J, 2009, 2009 IEEE C COMP VIS, P248