Increasing diversity in random forest learning algorithm via imprecise probabilities

被引:42
作者
Abellan, Joaquin [1 ]
Mantas, Carlos J. [1 ]
Castellano, Javier G. [1 ]
Moral-Garcia, SerafIn [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
关键词
Classification; Ensemble schemes; Random forest; Imprecise probabilities; Uncertainty measures; DECISION TREES; NEURAL-NETWORKS; CLASS NOISE; ENSEMBLE; CLASSIFICATION; CLASSIFIERS; CREDAL-C4.5; PREDICTION; REGRESSION;
D O I
10.1016/j.eswa.2017.12.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random Forest (RF) learning algorithm is considered a classifier of reference due its excellent performance. Its success is based on the diversity of rules generated from decision trees that are built via a procedure that randomizes instances and features. To find additional procedures for increasing the diversity of the trees is an interesting task. It has been considered a new split criterion, based on imprecise probabilities and general uncertainty measures, that has a clear dependence of a parameter and has shown to be more successful than the classic ones. Using that criterion in RF scheme, join with a random procedure to select the value of that parameter, the diversity of the trees in the forest and the performance are increased. This fact gives rise to a new classification algorithm, called Random Credal Random Forest (RCRF). The new method represents several improvements with respect to the classic RF: the use of a more successful split criterion which is more robust to noise than the classic ones; and an increasing of the randomness which facilitates the diversity of the rules obtained. In an experimental study, it is shown that this new algorithm is a clear enhancement of RF, especially when it applied on data sets with class noise, where the standard RF has a notable deterioration. The problem of overfitting that appears when RF classifies data sets with class noise is solved with RCRF. This new algorithm can be considered as a powerful alternative to be used on data with or without class noise. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:228 / 243
页数:16
相关论文
共 52 条
[1]   AdaptativeCC4.5: Credal C4.5 with a rough class noise estimator [J].
Abelian, Joaquin ;
Mantas, Carlos J. ;
Castellano, Javier G. .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 92 :363-379
[3]   Disaggregated total uncertainty measure for credal sets [J].
Abellán, J ;
Klir, GJ ;
Moral, S .
INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2006, 35 (01) :29-44
[4]   Upper entropy of credal sets.: Applications to credal classification [J].
Abellán, J ;
Moral, S .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2005, 39 (2-3) :235-255
[5]   Building classification trees using the total uncertainty criterion [J].
Abellán, J ;
Moral, S .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2003, 18 (12) :1215-1225
[6]   Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring [J].
Abellan, Joaquin ;
Mantas, Carlos J. .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (08) :3825-3830
[8]   Bagging schemes on the presence of class noise in classification [J].
Abellan, Joaquin ;
Masegosa, Andres R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :6827-6837
[9]   An ensemble method using credal decision trees [J].
Abellan, Joaquin ;
Masegosa, Andres R. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2010, 205 (01) :218-226
[10]   A FILTER-WRAPPER METHOD TO SELECT VARIABLES FOR THE NAIVE BAYES CLASSIFIER BASED ON CREDAL DECISION TREES [J].
Abellan, Joaquin ;
Masegosa, Andres R. .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2009, 17 (06) :833-854