SPY: a novel resampling method for improving classification performance in imbalanced data

被引：13

作者：

Xuan Tho Dang ^{[1
]}

Dang Hung Tran ^{[1
]}

Hirose, Osamu ^{[2
]}

Satou, Kenji ^{[2
]}

机构：

[1] Hanoi Natl Univ Educ, Fac Informat Technol, Hanoi, Vietnam

[2] Kanazawa Univ, Inst Sci & Engn, Kanazawa, Ishikawa, Japan

来源：

2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE) | 2015年

关键词：

Imbalanced dataset; Over-sampling; Under-sampling; SMOTE; borderline-SMOTE; SUPPORT VECTOR MACHINES; SMOTE;

D O I：

10.1109/KSE.2015.24

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, imbalanced class datasets have caused many difficulties influencing on the analysis and understanding of raw data, which support decision-making process in many domains, especially in biomedical data classifications. Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods. SMOTE is a famous and general over-sampling method addressing this problem, however, in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel method named SPY. Experimental results on five imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than the control method (i.e., no over-sampling), SMOTE, and several successors of modified SMOTE including safe-level-SMOTE, safe-SMOTE, and borderline-SMOTE.

引用

页码：280 / 285

页数：6

共 41 条

[1] Applying support vector machines to imbalanced datasets [J].

Akbani, R ;

Kwek, S ;

Japkowicz, N .

MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50

[2] An approach for classification of highly imbalanced data using weighting and undersampling [J].

Anand, Ashish ;

Pugalenthi, Ganesan ;

Fogel, Gary B. ;

Suganthan, P. N. .

AMINO ACIDS, 2010, 39 (05) :1385-1391

[3]

[Anonymous], J STAT SOFTWARE

[4] microPred: effective classification of pre-miRNAs for human miRNA gene prediction [J].

Batuwita, Rukshan ;

Palade, Vasile .

BIOINFORMATICS, 2009, 25 (08) :989-995

[5] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[6]

Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43

[7] A tutorial on Support Vector Machines for pattern recognition [J].

Burges, CJC .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167

[8]

Chawla NV, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P853, DOI 10.1007/0-387-25465-X_40

[9] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[10] A Novel Differential Evolution-Clustering Hybrid Resampling Algorithm on Imbalanced Datasets [J].

Chen, Leichen ;

Cai, Zhihua ;

Chen, Lu ;

Gu, Qiong .

THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, :81-85

← 1 2 3 4 5 →