NCLWO: Newton's cooling law-based weighted oversampling algorithm for imbalanced datasets with feature noise

被引:2
作者
Tao, Liangliang [1 ]
Wang, Qingya [1 ,2 ,3 ]
Zhu, Zhicheng [1 ]
Yu, Fen [1 ]
Yin, Xia [1 ]
机构
[1] Jiangxi Polytech Univ, Informat Engn Coll, Jiujiang 332007, Jiangxi, Peoples R China
[2] East China Univ Technol, Sch Earth Sci, Nanchang 330013, Jiangxi, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced classification; Oversampling technique; Newton's cooling law; Feature noise; CLASSIFICATION; SMOTE;
D O I
10.1016/j.neucom.2024.128538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced datasets pose challenges to standard classification algorithms. Although oversampling techniques can balance the number of samples across different classes, the difficulties of imbalanced classification is not solely imbalanced data itself but other factors, such as small disjuncts and overlapping regions, especially in the presence of noise. Traditional oversampling techniques are not effectively address these intricacies. To this end, we propose a novel oversampling method called Newton's Cooling Law-Based Weighted Oversampling (NCLWO). The proposed method initially calculates the weight of the minority class based on density and closeness factors to identify hard-to-learn samples, assigning them higher heat. Subsequently, Newton's Cooling Law is applied to each minority class sample by using it as the center and expanding the sampling region outward, gradually decreasing the heat until reaching a balanced state. Finally, majority class samples within the sampling region are translated to eliminate overlapping areas, and a weighted oversampling approach is employed to synthesize informative minority class samples. The experimental study, carried out on a set of benchmark datasets, confirm that the proposed method not only outperforms state-of-the-art oversampling approaches but also shows greater robustness in the presence of feature noise.
引用
收藏
页数:16
相关论文
共 47 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[3]  
Batista G.E.A.P.A., 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[4]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[5]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[6]  
Chawla N. V., 2004, SIGKDD Explor. Newsl., V6, P1
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]   Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE [J].
Douzas, Georgios ;
Bacao, Fernando .
INFORMATION SCIENCES, 2019, 501 :118-135