A novel approach to estimate proximity in a random forest: An exploratory study

被引:18
作者
Englund, C. [1 ]
Verikas, A. [2 ,3 ]
机构
[1] Viktoria Inst, S-41756 Gothenburg, Sweden
[2] Halmstad Univ, Intelligent Syst Lab, S-30118 Halmstad, Sweden
[3] Kaunas Univ Technol, Dept Elect & Control Equipment, LT-51368 Kaunas, Lithuania
关键词
Random forest; Proximity matrix; Support vector machine; Kernel matrix; Data mining;
D O I
10.1016/j.eswa.2012.05.094
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A data proximity matrix is an important information source in random forests (RF) based data mining, including data clustering, visualization, outlier detection, substitution of missing values, and finding mislabeled data samples. A novel approach to estimate proximity is proposed in this work. The approach is based on measuring distance between two terminal nodes in a decision tree. To assess the consistency (quality) of data proximity estimate, we suggest using the proximity matrix as a kernel matrix in a support vector machine (SVM), under the assumption that a matrix of higher quality leads to higher classification accuracy. It is experimentally shown that the proposed approach improves the proximity estimate, especially when RF is made of a small number of trees. It is also demonstrated that, for some tasks, an SVM exploiting the suggested proximity matrix based kernel, outperforms an SVM based on a standard radial basis function kernel and the standard proximity matrix based kernel. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13046 / 13050
页数:5
相关论文
共 12 条
[1]   Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics [J].
Antipov, Evgeny A. ;
Pokryshevskaya, Elena B. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) :1772-1778
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
Breiman L., 2004, ORAL HLTH STATUS ORA
[4]   Random Forests for land cover classification [J].
Gislason, PO ;
Benediktsson, JA ;
Sveinsson, JR .
PATTERN RECOGNITION LETTERS, 2006, 27 (04) :294-300
[5]   Modern data mining tools in descriptive sensory analysis: A case study with a Random forest approach [J].
Granitto, P. M. ;
Gasperi, F. ;
Biasioli, F. ;
Trainotti, E. ;
Furlanello, C. .
FOOD QUALITY AND PREFERENCE, 2007, 18 (04) :681-689
[6]   Exploring precrash maneuvers using classification trees and random forests [J].
Harb, Rami ;
Yan, Xuedong ;
Radwan, Essam ;
Su, Xiaogang .
ACCIDENT ANALYSIS AND PREVENTION, 2009, 41 (01) :98-107
[7]   Predicting customer retention and profitability by using random forests and regression forests techniques [J].
Larivière, B ;
Van den Poel, D .
EXPERT SYSTEMS WITH APPLICATIONS, 2005, 29 (02) :472-484
[8]   Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis [J].
Ozcift, Akin .
COMPUTERS IN BIOLOGY AND MEDICINE, 2011, 41 (05) :265-271
[9]   Computer aided diagnosis system for the Alzheimer's disease based on least squares and random forest SPECT image classification [J].
Ramirez, J. ;
Gorriz, J. M. ;
Segovia, F. ;
Chaves, R. ;
Salas-Gonzalez, D. ;
Lopez, M. ;
Alvarez, I. ;
Padilla, P. .
NEUROSCIENCE LETTERS, 2010, 472 (02) :99-103
[10]  
Sainlez M, 2010, COMPUT-AIDED CHEM EN, V28, P403