CCO: A Cluster Core-Based Oversampling Technique for Improved Class-Imbalanced Learning

被引:7
作者
Mondal, Priyobrata [1 ]
Ansari, Faizanuddin [1 ]
Das, Swagatam [1 ]
机构
[1] Indian Stat Inst, Elect & Commun Sci Unit, Kolkata 700108, India
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 02期
关键词
Clustering algorithms; Noise measurement; Interpolation; Noise; Computational intelligence; Classification algorithms; Task analysis; Classification; imbalanced data; oversampling; synthetic minority oversampling technique; MEAN SHIFT; K-MEANS; SMOTE;
D O I
10.1109/TETCI.2024.3407784
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised classification problems from the real world typically face a challenge characterized by the scarcity of samples in one or more target classes compared to the rest of the majority classes. In response to such class imbalance, we propose an oversampling technique based on clustering, aiming to populate the minority class with synthetic samples. This approach capitalizes on the notion of "Cluster Cores," representing locally dense regions within clusters. These Cluster Cores act as central, densely crowded areas that capture intricate topological properties of the corresponding clusters, especially in complex datasets with a non-convex spatial orientation in the feature space. By concentrating on these high-density regions, our clustering-based oversampling technique generates synthetic samples within the convex hull region of minority class instances in the formed clusters. This strategy ensures the creation of points that align with the data space and considers each minority instance within a specific cluster, thereby averting the problems encountered due to the generation of artificial samples by mere linear combination of the minority class data points, as is encountered in SMOTE (Synthetic Minority Oversampling Technique)-based algorithms. To assess the efficacy of our proposal, we conducted experimental comparisons against several cutting-edge algorithms, considering an array of evaluation metrics on well-known datasets used in the literature for both binary and multi-class classification. Additionally, we undertook a detailed ablation study, scrutinized existing algorithms in our context, delineated their strengths and limitations, and contemplated potential research directions in this domain.
引用
收藏
页码:1153 / 1165
页数:13
相关论文
共 40 条
[11]   A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning [J].
Elreedy, Dina ;
Atiya, Amir F. ;
Kamalov, Firuz .
MACHINE LEARNING, 2024, 113 (07) :4903-4923
[12]  
Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
[13]   Preprocessing unbalanced data using support vector machine [J].
Farquad, M. A. H. ;
Bose, Indranil .
DECISION SUPPORT SYSTEMS, 2012, 53 (01) :226-233
[14]  
Ghorab A. S., 2022, EXPLORE BUSINESS TEC, P23
[15]   Evolutionary Dual-Ensemble Class Imbalance Learning for Human Activity Recognition [J].
Guo, Yinan ;
Chu, Yaoqi ;
Jiao, Botao ;
Cheng, Jian ;
Yu, Zekuan ;
Cui, Ning ;
Ma, Lianbo .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (04) :728-739
[16]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[17]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[18]   A Novel Doubly Reweighting Multisource Transfer Learning Framework [J].
Ji, Dingcheng ;
Jiang, Yizhang ;
Qian, Pengjiang ;
Wang, Shitong .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2019, 3 (05) :380-391
[19]  
Jiang H., 2018, P INT C MACH LEARN, P2303
[20]   Hybrid neural network with cost-sensitive support vector machine for class-imbalanced multimodal data [J].
Kim, Kyung Hye ;
Sohn, So Young .
NEURAL NETWORKS, 2020, 130 :176-184