Improving imbalanced medical image classification through GAN-based data augmentation methods

被引:1
作者
Ding, Hongwei [1 ,2 ]
Huang, Nana [3 ]
Wu, Yaoxin [4 ]
Cui, Xiaohui [4 ]
机构
[1] Northeastern Univ Qinhuangdao, Sch Comp & Commun Engn, Qinhuangdao 066000, Peoples R China
[2] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066000, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Cyberspace, Hangzhou 310018, Peoples R China
[4] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan 430072, Peoples R China
关键词
Imbalanced data; Generative adversarial networks; Intra-class mode collapse; Data augmentation;
D O I
10.1016/j.patcog.2025.111680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the medical field, there exists a prevalent issue of data imbalance, severely impacting the performance of machine learning. Traditional data augmentation methods struggle to effectively generate augmented samples with strong diversity. Generative Adversarial Networks (GANs) can produce more effective new samples by learning the global distribution of samples. Although existing GAN models can balance inter-class distributions, the presence of sparse samples within classes can lead to intra-class mode collapse, rendering them unable to effectively fit the sparse region distribution. Based on this, our study proposes a two-step solution. Firstly, we employ a Cluster-Based Local Outlier Factor (CBLOF) algorithm to identify sparse and dense samples intra-class. Then, using these sparse and dense samples as conditions, we train the GAN model to better focus on fitting sparse samples intra-class. Finally, after training the GAN model, we propose using the One-Class SVM (OCS) algorithm as a noise filter to obtain pure augmented samples. We conducted extensive validation experiments on four medical datasets: BloodMNIST, OrganCMNIST, PathMNIST, and PneumoniaMNIST. The experimental results indicate that the method proposed in this study can generate samples with greater diversity and higher quality. Furthermore, by incorporating augmented samples, the accuracy improved by approximately 3% across four datasets.
引用
收藏
页数:14
相关论文
共 47 条
[1]   A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework [J].
Aguiar, Gabriel ;
Krawczyk, Bartosz ;
Cano, Alberto .
MACHINE LEARNING, 2024, 113 (07) :4165-4243
[2]  
Akter S.B., 2023, 2023 26 INT C COMP I, P1
[3]   The Role of Generative Adversarial Network in Medical Image Analysis: An In-depth Survey [J].
Alamir, Manal ;
Alghamdi, Manal .
ACM COMPUTING SURVEYS, 2023, 55 (05)
[4]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[5]   Generating synthetic medical images with limited data using auxiliary classifier generative adversarial network: a study on thyroid ultrasound images [J].
Atri, Hamidreza ;
Shadi, Mahdieh ;
Sargolzaei, Mahdi .
JOURNAL OF ULTRASOUND, 2024, 27 (01) :105-121
[6]   Analysis of the ISIC image datasets: Usage, benchmarks and recommendations [J].
Cassidy, Bill ;
Kendrick, Connah ;
Brodzicki, Andrzej ;
Jaworek-Korjakowska, Joanna ;
Yap, Moi Hoon .
MEDICAL IMAGE ANALYSIS, 2022, 75
[7]   Handling class imbalance in COVID-19 chest X-ray images classification: Using SMOTE and weighted loss [J].
Chamseddine, Ekram ;
Mansouri, Nesrine ;
Soui, Makram ;
Abed, Mourad .
APPLIED SOFT COMPUTING, 2022, 129
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   A review of medical image data augmentation techniques for deep learning applications [J].
Chlap, Phillip ;
Min, Hang ;
Vandenberg, Nym ;
Dowling, Jason ;
Holloway, Lois ;
Haworth, Annette .
JOURNAL OF MEDICAL IMAGING AND RADIATION ONCOLOGY, 2021, 65 (05) :545-563
[10]   On Supervised Class-Imbalanced Learning: An Updated Perspective and Some Key Challenges [J].
Das S. ;
Mullick S.S. ;
Zelinka I. .
IEEE Transactions on Artificial Intelligence, 2022, 3 (06) :973-993