Classification of web documents using graph matching

被引:49
作者
Schenker, A
Last, M
Bunke, H
Kandel, A
机构
[1] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
[2] Univ Bern, Dept Comp Sci, Inst Informat & Angew Math, CH-3012 Bern, Switzerland
[3] Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel
关键词
graph representation; graph matching; document classification; k-nearest neighbors algorithm;
D O I
10.1142/S0218001404003241
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a classification method that allows the use of graph-based representations of data instead of traditional vector-based representations. We compare the vector approach combined with the k-Nearest Neighbor (k-NN) algorithm to the graph-matching approach when classifying three different web document collections, using the leave-one-out approach for measuring classification accuracy. We also compare the performance of different graph distance measures as well as various document representations that utilize graphs. The results show the graph-based approach can outperform traditional vector-based methods in terms of accuracy, dimensionality and execution time.
引用
收藏
页码:475 / 496
页数:22
相关论文
共 18 条