Learning From Imbalanced Data Using Triplet Adversarial Samples

被引：0

作者：

Yun, Jaesub ^{[1
]}

Lee, Jong-Seok ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon 16419, South Korea

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

新加坡国家研究基金会;

关键词：

Data generation; Synthetic data; Data models; Optimization; Neural networks; Deep learning; Adaptation models; Class imbalance; triplet loss; adversarial samples; synthetic data generation; multiple classes; area under ROC; AREA; SMOTE;

D O I：

10.1109/ACCESS.2023.3262604

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The imbalance of classes in real-world datasets poses a major challenge in machine learning and classification, and traditional synthetic data generation methods often fail to address this problem effectively. A major limitation of these methods is that they tend to separate the process of generating synthetic samples from the training process, resulting in synthetic data that lack the necessary informative characteristics for proper model training. We present a new synthetic data generation method that addresses this issue by combining adversarial sample generation with a triplet loss method. This approach focuses on increasing the diversity in the minority class while preserving the integrity of the decision boundary. Furthermore, we show that reducing triplet loss is equivalent to maximizing the area under the receiver operating characteristic curve under specific conditions, providing a theoretical basis for the effectiveness of our method. In addition, we present a model training approach to further improve the generalization of the model to small classes by providing a diverse set of synthetic samples optimized using our proposed loss function. We evaluated our method on several imbalanced benchmark tasks and compared it to state-of-the-art techniques, demonstrating that our method can deliver even better performance, making it an effective solution to the class imbalance problem.

引用

页码：31467 / 31478

页数：12

共 55 条

[1] Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2] Evaluation of SMOTE for high-dimensional class-imbalanced microarray data
Blagus, Rok
Lusa, Lara
[J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 89 - 94
[3] A systematic study of the class imbalance problem in convolutional neural networks
Buda, Mateusz
Maki, Atsuto
Mazurowski, Maciej A.
[J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
[4] Cao KD, 2019, Arxiv, DOI arXiv:1906.07413
[5] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[6] PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification
Chen, Kanghao
Lei, Weixian
Zhao, Shen
Zheng, Wei-Shi
Wang, Ruixuan
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (04) : 2026 - 2036
[7] Ranking and empirical minimization of U-statistics
Clemencon, Stephan
Lugosi, Gabor
Vayatis, Nicolas
[J]. ANNALS OF STATISTICS, 2008, 36 (02) : 844 - 874
[8] Randaugment: Practical automated data augmentation with a reduced search space
Cubuk, Ekin D.
Zoph, Barret
Shlens, Jonathon
Le, Quoc, V
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3008 - 3017
[9] Class-Balanced Loss Based on Effective Number of Samples
Cui, Yin
Jia, Menglin
Lin, Tsung-Yi
Song, Yang
Belongie, Serge
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9260 - 9269
[10] Deng J, 2009, 2009 IEEE C COMP VIS, P248

← 1 2 3 4 5 6 →