CCO: A Cluster Core-Based Oversampling Technique for Improved Class-Imbalanced Learning

被引:7
作者
Mondal, Priyobrata [1 ]
Ansari, Faizanuddin [1 ]
Das, Swagatam [1 ]
机构
[1] Indian Stat Inst, Elect & Commun Sci Unit, Kolkata 700108, India
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 02期
关键词
Clustering algorithms; Noise measurement; Interpolation; Noise; Computational intelligence; Classification algorithms; Task analysis; Classification; imbalanced data; oversampling; synthetic minority oversampling technique; MEAN SHIFT; K-MEANS; SMOTE;
D O I
10.1109/TETCI.2024.3407784
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised classification problems from the real world typically face a challenge characterized by the scarcity of samples in one or more target classes compared to the rest of the majority classes. In response to such class imbalance, we propose an oversampling technique based on clustering, aiming to populate the minority class with synthetic samples. This approach capitalizes on the notion of "Cluster Cores," representing locally dense regions within clusters. These Cluster Cores act as central, densely crowded areas that capture intricate topological properties of the corresponding clusters, especially in complex datasets with a non-convex spatial orientation in the feature space. By concentrating on these high-density regions, our clustering-based oversampling technique generates synthetic samples within the convex hull region of minority class instances in the formed clusters. This strategy ensures the creation of points that align with the data space and considers each minority instance within a specific cluster, thereby averting the problems encountered due to the generation of artificial samples by mere linear combination of the minority class data points, as is encountered in SMOTE (Synthetic Minority Oversampling Technique)-based algorithms. To assess the efficacy of our proposal, we conducted experimental comparisons against several cutting-edge algorithms, considering an array of evaluation metrics on well-known datasets used in the literature for both binary and multi-class classification. Additionally, we undertook a detailed ablation study, scrutinized existing algorithms in our context, delineated their strengths and limitations, and contemplated potential research directions in this domain.
引用
收藏
页码:1153 / 1165
页数:13
相关论文
共 40 条
[1]  
Ansari F., 2023, P INT JOINT C NEUR N, P8
[2]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[3]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[4]  
Calleja J, 2007, FLAIRS C, P634
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[7]   Combating imbalance in network intrusion datasets [J].
Cieslak, David A. ;
Chawla, Nitesh V. ;
Striegel, Aaron .
2006 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, 2006, :732-+
[8]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[9]   On Supervised Class-Imbalanced Learning: An Updated Perspective and Some Key Challenges [J].
Das S. ;
Mullick S.S. ;
Zelinka I. .
IEEE Transactions on Artificial Intelligence, 2022, 3 (06) :973-993
[10]   Graph Regularized Sparse Non-Negative Matrix Factorization for Clustering [J].
Deng, Ping ;
Li, Tianrui ;
Wang, Hongjun ;
Wang, Dexian ;
Horng, Shi-Jinn ;
Liu, Rui .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (03) :910-921