共 12 条
A novel approach to estimate proximity in a random forest: An exploratory study
被引:18
作者:
Englund, C.
[1
]
Verikas, A.
[2
,3
]
机构:
[1] Viktoria Inst, S-41756 Gothenburg, Sweden
[2] Halmstad Univ, Intelligent Syst Lab, S-30118 Halmstad, Sweden
[3] Kaunas Univ Technol, Dept Elect & Control Equipment, LT-51368 Kaunas, Lithuania
关键词:
Random forest;
Proximity matrix;
Support vector machine;
Kernel matrix;
Data mining;
D O I:
10.1016/j.eswa.2012.05.094
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
A data proximity matrix is an important information source in random forests (RF) based data mining, including data clustering, visualization, outlier detection, substitution of missing values, and finding mislabeled data samples. A novel approach to estimate proximity is proposed in this work. The approach is based on measuring distance between two terminal nodes in a decision tree. To assess the consistency (quality) of data proximity estimate, we suggest using the proximity matrix as a kernel matrix in a support vector machine (SVM), under the assumption that a matrix of higher quality leads to higher classification accuracy. It is experimentally shown that the proposed approach improves the proximity estimate, especially when RF is made of a small number of trees. It is also demonstrated that, for some tasks, an SVM exploiting the suggested proximity matrix based kernel, outperforms an SVM based on a standard radial basis function kernel and the standard proximity matrix based kernel. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13046 / 13050
页数:5
相关论文