LDAS: Local density-based adaptive sampling for imbalanced data classification

被引:42
作者
Yan, Yuanting [1 ]
Jiang, Yifei [1 ]
Zheng, Zhong [1 ]
Yu, Chengjin [2 ]
Zhang, Yiwen [1 ]
Zhang, Yanping [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Minist Educ, Key Lab Intelligent Comp & Signal Proc, Hefei 230601, Anhui, Peoples R China
[2] Zhejiang Univ, Coll Opt Sci & Engn, Hangzhou 310007, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Imbalanced classification; Local density; Overlapping data; Re-sampling; MINORITY OVERSAMPLING TECHNIQUE; SMOTE; GENERATION; ENSEMBLE; NOISY;
D O I
10.1016/j.eswa.2021.116213
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance poses a great challenge to traditional classifiers in machine learning as they strongly favor the majority class while ignoring the minority class. Synthetic over-sampling methods deal with this problem by generating synthetic examples to balance the distribution of data. However, most existing methods prefer to generate synthetic examples in a specific area without considering the complexity of imbalance distribution, which may result in the over-emphasis of learning model on some data difficulty factors. To this end, we propose a local density-based adaptive sampling method (LDAS) for imbalanced data. LDAS first assigns a local density for each minority example, then a new cleaning strategy is proposed to remove the overlapping majority examples. Finally, it weighs each minority example based on its approaching degree of decision boundary and the corresponding local density. This is done in such a way that synthetic examples are generated in the safe area and the border area simultaneously according to the weight of minority examples. Extensive experiments on KEEL datasets demonstrate the effectiveness of the proposal LDAS.
引用
收藏
页数:13
相关论文
共 68 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[3]  
Batista G, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI DOI 10.1145/1007730.1007735
[4]  
Batista GEAPA, 2005, LECT NOTES COMPUT SC, V3646, P24
[5]  
Benavoli A, 2017, J MACH LEARN RES, V18
[6]   IIvotes ensemble for imbalanced data [J].
Blaszczynski, Jerzy ;
Deckert, Magdalena ;
Stefanowski, Jerzy ;
Wilk, Szymon .
INTELLIGENT DATA ANALYSIS, 2012, 16 (05) :777-801
[7]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[8]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[9]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)