Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets

被引:97
作者
Li, Fenglian [1 ,2 ]
Zhang, Xueying [1 ]
Zhang, Xiqian [1 ]
Du, Chunlei [1 ]
Xu, Yue [2 ]
Tian, Yu-Chu [1 ,2 ]
机构
[1] Taiyuan Univ Technol, Coll Informat Engn, Taiyuan 030024, Shanxi, Peoples R China
[2] Queensland Univ Technol, Sch Elect Engn & Comp Sci, GPO Box 2434, Brisbane, Qld 4001, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Multi-decision tree; Minority class; Imbalanced data set; Cost sensitivity; Hybrid attribute measure; CLASSIFICATION; CLASSIFIERS;
D O I
10.1016/j.ins.2017.09.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most popular algorithms for classification is the decision tree. However, existing binary decision tree models do not handle well the minority class over imbalanced data sets. To address this difficulty, a Cost-sensitive and Hybrid attribute measure Multi Decision Tree (CHMDT) approach is presented in this paper. It penalizes misclassification through a hybrid attribute measure, which is defined from the combination of the Gini index and information gain measure. It further builds a multi-decision tree consisting of multiple decision trees each with different root node information. The overall objective of the approach is to maximize the classification performance with the hybrid attribute measure while minimizing the total misclassification cost. Experiments are conducted over twelve KEEL imbalanced data sets to demonstrate the CHMDT approach. They show that the classification performance of the minority class is improved significantly without sacrifice of the overall classification accuracy of the majority class. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:242 / 256
页数:15
相关论文
共 37 条
[1]  
Ailing Ni., 2005, Asian Journal of Information Technology, V4, P1080
[2]   A proposal for evolutionary fuzzy systems using feature weighting: Dealing with overlapping in imbalanced datasets [J].
Alshomrani, Saleh ;
Bawakid, Abdullah ;
Shim, Seong-O ;
Fernandez, Alberto ;
Herrera, Francisco .
KNOWLEDGE-BASED SYSTEMS, 2015, 73 :1-17
[3]  
[Anonymous], INT JOINT C ART INT
[4]   An experimental study on evolutionary fuzzy classifiers designed for managing imbalanced datasets [J].
Antonelli, Michela ;
Ducange, Pietro ;
Marcelloni, Francesco .
NEUROCOMPUTING, 2014, 146 :125-136
[5]   Example-dependent cost-sensitive decision trees [J].
Bahnsen, Alejandro Correa ;
Aouada, Djamila ;
Ottersten, Bjoern .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (19) :6609-6619
[6]   The Multiclass ROC Front method for cost-sensitive classification [J].
Bernard, Simon ;
Chatelain, Clement ;
Adam, Sebastien ;
Sabourin, Robert .
PATTERN RECOGNITION, 2016, 52 :46-60
[7]   Classifying imbalanced data sets using similarity based hierarchical decomposition [J].
Beyan, Cigdem ;
Fisher, Robert .
PATTERN RECOGNITION, 2015, 48 (05) :1653-1672
[8]   Simple decision forests for multi-relational classification [J].
Bina, Bahareh ;
Schulte, Oliver ;
Crawford, Branden ;
Qian, Zhensong ;
Xiong, Yi .
DECISION SUPPORT SYSTEMS, 2013, 54 (03) :1269-1279
[9]   A method for resampling imbalanced datasets in binary classification tasks for real-world problems [J].
Cateni, Silvia ;
Colla, Valentina ;
Vannucci, Marco .
NEUROCOMPUTING, 2014, 135 :32-41
[10]  
Chawla N., 2004, P ACM SIGKDD