Similarity Evaluation of XML Documents Based on Weighted Element Tree Model

被引:0
作者
Wang, Chenying [1 ]
Yuan, Xiaojie [1 ]
Ning, Hua [1 ]
Lian, Xin [1 ]
机构
[1] Nankai Univ, Dept Comp Sci & Technol, Tianjin 300071, Peoples R China
来源
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS | 2009年 / 5678卷
关键词
XML; similarity evaluation; clustering; element tree;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The logical presentation model of XML data is the basis of XML data management. After introducing XML tree models and frequent pattern models, in this paper we have proposed a novel Weighted Element Tree Model (WETM) for measuring the structural similarity of XML documents. This model is a concise form of XML tree models, so the efficiency of the operation on this model is higher than XML tree models. And comparing with frequent pattern models, the WETM enhances the expression ability of structural information of sub trees, which can appreciate the accuracy of similarity evaluation. Moreover, in order to compare the performance of the proposed evaluation algorithm, it is applied to XML documents clustering. The experimental results show that our algorithm is superior to the algorithms based on tree models or frequent pattern models.
引用
收藏
页码:680 / 687
页数:8
相关论文
共 11 条
[1]   A survey on tree edit distance and related problems [J].
Bille, P .
THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) :217-239
[2]  
Chawathe SS, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P90
[3]   A methodology for clustering XML documents by structure [J].
Dalamagas, T ;
Cheng, T ;
Winkel, KJ ;
Sellis, T .
INFORMATION SYSTEMS, 2006, 31 (03) :187-228
[4]  
HWANG JH, 2007, P 2007 INT C CONV IN, P845
[5]  
Leung HP, 2005, INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, P91
[6]   XML schema clustering with semantic and hierarchical similarity measures [J].
Nayak, Richi ;
Iryadi, Wina .
KNOWLEDGE-BASED SYSTEMS, 2007, 20 (04) :336-349
[7]   Classes of cost functions for string edit distance [J].
Rice, SV ;
Bunke, H ;
Nartker, TA .
ALGORITHMICA, 1997, 18 (02) :271-280
[8]   TREE-TO-TREE EDITING PROBLEM [J].
SELKOW, SM .
INFORMATION PROCESSING LETTERS, 1977, 6 (06) :184-186
[9]  
*SIGMOD, SIGMOD REC DAT
[10]  
*W3C, 2004, REC DOC OBJ MOD DOM