SPY: a novel resampling method for improving classification performance in imbalanced data

被引:13
作者
Xuan Tho Dang [1 ]
Dang Hung Tran [1 ]
Hirose, Osamu [2 ]
Satou, Kenji [2 ]
机构
[1] Hanoi Natl Univ Educ, Fac Informat Technol, Hanoi, Vietnam
[2] Kanazawa Univ, Inst Sci & Engn, Kanazawa, Ishikawa, Japan
来源
2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE) | 2015年
关键词
Imbalanced dataset; Over-sampling; Under-sampling; SMOTE; borderline-SMOTE; SUPPORT VECTOR MACHINES; SMOTE;
D O I
10.1109/KSE.2015.24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, imbalanced class datasets have caused many difficulties influencing on the analysis and understanding of raw data, which support decision-making process in many domains, especially in biomedical data classifications. Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods. SMOTE is a famous and general over-sampling method addressing this problem, however, in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel method named SPY. Experimental results on five imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than the control method (i.e., no over-sampling), SMOTE, and several successors of modified SMOTE including safe-level-SMOTE, safe-SMOTE, and borderline-SMOTE.
引用
收藏
页码:280 / 285
页数:6
相关论文
共 41 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]   An approach for classification of highly imbalanced data using weighting and undersampling [J].
Anand, Ashish ;
Pugalenthi, Ganesan ;
Fogel, Gary B. ;
Suganthan, P. N. .
AMINO ACIDS, 2010, 39 (05) :1385-1391
[3]  
[Anonymous], J STAT SOFTWARE
[4]   microPred: effective classification of pre-miRNAs for human miRNA gene prediction [J].
Batuwita, Rukshan ;
Palade, Vasile .
BIOINFORMATICS, 2009, 25 (08) :989-995
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[7]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[8]  
Chawla NV, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P853, DOI 10.1007/0-387-25465-X_40
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   A Novel Differential Evolution-Clustering Hybrid Resampling Algorithm on Imbalanced Datasets [J].
Chen, Leichen ;
Cai, Zhihua ;
Chen, Lu ;
Gu, Qiong .
THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, :81-85