DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique

被引:256
作者
Bunkhumpornpat, Chumphol [1 ]
Sinapiromsaran, Krung [1 ]
Lursinsap, Chidchanok [1 ]
机构
[1] Chulalongkorn Univ, Fac Sci, Dept Math, Bangkok 10330, Thailand
关键词
Classification; Class imbalance; Over-sampling; Density-based; SYSTEM; SMOTE;
D O I
10.1007/s10489-011-0287-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A dataset exhibits the class imbalance problem when a target class has a very small number of instances relative to other classes. A trivial classifier typically fails to detect a minority class due to its extremely low incidence rate. In this paper, a new over-sampling technique called DBSMOTE is proposed. Our technique relies on a density-based notion of clusters and is designed to over-sample an arbitrarily shaped cluster discovered by DBSCAN. DBSMOTE generates synthetic instances along a shortest path from each positive instance to a pseudo-centroid of a minority-class cluster. Consequently, these synthetic instances are dense near this centroid and are sparse far from this centroid. Our experimental results show that DBSMOTE improves precision, F-value, and AUC more effectively than SMOTE, Borderline-SMOTE, and Safe-Level-SMOTE for imbalanced datasets.
引用
收藏
页码:664 / 684
页数:21
相关论文
共 32 条
[1]  
[Anonymous], 1997, P 14 INT C ONMACHINE
[2]   Skeleton-based shape classification using path similarity [J].
Bai, Xiang ;
Yang, Xingwei ;
Yu, Deguang ;
Latecki, Longin Jan .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2008, 22 (04) :733-746
[3]  
Batista G. E., 2004, ACM SIGKDD Explor. Newslett., P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[4]  
BLAKE CL, 2009, UCI REPOSITORY MACHI
[5]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[6]  
BUCKLAND M, 1994, J AM SOC INFORM SCI, V45, P12, DOI 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO
[7]  
2-L
[8]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119