One class random forests

被引:128
作者
Desir, Chesner [1 ]
Bernard, Simon [1 ]
Petitjean, Caroline [1 ]
Heutte, Laurent [1 ]
机构
[1] Univ Rouen, LITIS, EA 4108, F-76801 St Etienne, France
关键词
One class classification; Supervised learning; Decision trees; Ensemble methods; Random forests; Outlier generation; Outlier detection; ONE-CLASS CLASSIFICATION; NOVELTY DETECTION; NETWORK; SUPPORT; CLASSIFIERS; NORMALITY; DENSITY; TESTS;
D O I
10.1016/j.patcog.2013.05.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One class classification is a binary classification task for which only one class of samples is available for learning. In some preliminary works, we have proposed One Class Random Forests (OCRF), a method based on a random forest algorithm and an original outlier generation procedure that makes use of classifier ensemble randomization principles. In this paper, we propose an extensive study of the behavior of OCRF, that includes experiments on various UCI public datasets and comparison to reference one class namely, Gaussian density models, Parzen estimators, Gaussian mixture models and One Class SVMs-with statistical significance. Our aim is to show that the randomization principles embedded in a random forest algorithm make the outlier generation process more efficient, and allow in particular to break the curse of dimensionality. One Class Random Forests are shown to perform well in comparison to other methods, and in particular to maintain stable performance in higher dimension, while the other algorithms may fail. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3490 / 3506
页数:17
相关论文
共 91 条
[1]  
[Anonymous], 2003, Neural computing surveys
[2]  
[Anonymous], 2006, Proceedings of the 12th international conference on Knowledge discovery and data mining
[3]  
[Anonymous], 2007, IEEE T NEURAL NETWOR, DOI DOI 10.1109/TNN.2007.897478
[4]  
[Anonymous], UCI Repository of machine learning databases
[5]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[6]  
Awate S. P., 2006, THESIS SALT LAKE CIT
[7]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[8]   A comparison of decision tree ensemble creation techniques [J].
Banfield, Robert E. ;
Hall, Lawrence O. ;
Bowyer, Kevin W. ;
Kegelmeyer, W. P. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (01) :173-180
[9]  
Bernard S, 2008, LECT NOTES COMPUT SC, V5227, P430, DOI 10.1007/978-3-540-85984-0_52
[10]  
Bernard S, 2009, LECT NOTES COMPUT SC, V5519, P171, DOI 10.1007/978-3-642-02326-2_18