Increasing diversity in random forest learning algorithm via imprecise probabilities

被引：42

作者：

Abellan, Joaquin ^{[1
]}

Mantas, Carlos J. ^{[1
]}

Castellano, Javier G. ^{[1
]}

Moral-Garcia, SerafIn ^{[1
]}

机构：

[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2018年 / 97卷

关键词：

Classification; Ensemble schemes; Random forest; Imprecise probabilities; Uncertainty measures; DECISION TREES; NEURAL-NETWORKS; CLASS NOISE; ENSEMBLE; CLASSIFICATION; CLASSIFIERS; CREDAL-C4.5; PREDICTION; REGRESSION;

D O I：

10.1016/j.eswa.2017.12.029

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Random Forest (RF) learning algorithm is considered a classifier of reference due its excellent performance. Its success is based on the diversity of rules generated from decision trees that are built via a procedure that randomizes instances and features. To find additional procedures for increasing the diversity of the trees is an interesting task. It has been considered a new split criterion, based on imprecise probabilities and general uncertainty measures, that has a clear dependence of a parameter and has shown to be more successful than the classic ones. Using that criterion in RF scheme, join with a random procedure to select the value of that parameter, the diversity of the trees in the forest and the performance are increased. This fact gives rise to a new classification algorithm, called Random Credal Random Forest (RCRF). The new method represents several improvements with respect to the classic RF: the use of a more successful split criterion which is more robust to noise than the classic ones; and an increasing of the randomness which facilitates the diversity of the rules obtained. In an experimental study, it is shown that this new algorithm is a clear enhancement of RF, especially when it applied on data sets with class noise, where the standard RF has a notable deterioration. The problem of overfitting that appears when RF classifies data sets with class noise is solved with RCRF. This new algorithm can be considered as a powerful alternative to be used on data with or without class noise. (C) 2017 Elsevier Ltd. All rights reserved.

引用

页码：228 / 243

页数：16

共 52 条

[1] AdaptativeCC4.5: Credal C4.5 with a rough class noise estimator [J].