A Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix

被引:2
|
作者
Zhang, Xue-Liang [1 ]
Yang, Ting [1 ]
Fan, Bao-Quan [1 ]
Wang, Xu [1 ]
Wei, Jin-Mao [1 ]
机构
[1] Nankai Univ, Coll Informat Tech Sci, Tianjin 300071, Peoples R China
来源
INTERNATIONAL CONFERENCE ON APPLIED PHYSICS AND INDUSTRIAL ENGINEERING 2012, PT B | 2012年 / 24卷
关键词
similarity; XML; semantic; structure; adjacency matrix;
D O I
10.1016/j.phpro.2012.02.215
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Similarity measurement of XML documents is crucial to meet various needs of approximate searches and document classifications in XML-oriented applications. Some methods have been proposed for this purpose. Nevertheless, few methods can be elegantly exploited to depict structure and semantic information and hence to effectively measure the similarity of XML documents. In this paper, we present a new method of computing the structure and semantic similarity of XML documents based on extended adjacency matrix(EAM). Different from a general adjacency matrix, in an EAM, the structure information of not only the adjacent layers but also the ancestor-descendant layers can be stored. For measuring the similarity of two XML documents, the proposed method firstly stores the structure and semantic information in two extended adjacency matrices (M-1,M-2). Then it computes similarity of the two documents through cos(M-1,M-2). Experimental results on bench-mark data show that the method holds high efficiency and accuracy. (C) 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of ICAPIE Organization Committee.
引用
收藏
页码:1452 / 1461
页数:10
相关论文
共 50 条
  • [1] A novel method for measuring semantic similarity for XML schema matching
    Jeong, Buhwan
    Lee, Damon
    Cho, Hyunbo
    Lee, Jaewook
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) : 1651 - 1658
  • [2] An Extended Role Based Access Control Method for XML Documents
    MENG Xiao-feng
    Wuhan University Journal of Natural Sciences, 2004, (05) : 740 - 744
  • [3] Measuring Similarities between XML Documents based on Content and Structure
    Xia, Xiaoling
    Guo, Yongming
    Le, Jiajin
    2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 459 - 462
  • [4] Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX
    Magdaleno, Damny
    Fuentes, Vett E.
    Garcia, Maria M.
    COMPUTACION Y SISTEMAS, 2015, 19 (01): : 151 - 161
  • [5] Similarity search for office XML documents based on style and structure data
    Watanabe, Yousuke
    Kamigaito, Hidetaka
    Yokota, Haruo
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2013, 9 (02) : 100 - 116
  • [6] An efficient similarity-based approach for comparing XML documents
    Oliveira, Alessandreia
    Tessarolli, Gabriel
    Ghiotto, Gleiph
    Pinto, Bruno
    Campello, Fernando
    Marques, Matheus
    Oliveira, Carlos
    Rodrigues, Igor
    Kalinowski, Marcos
    Souza, Ueverton
    Murta, Leonardo
    Braganholo, Vanessa
    INFORMATION SYSTEMS, 2018, 78 : 40 - 57
  • [7] A METHODOLOGY FOR USING EDGES TO MEASURE STRUCTURAL AND SEMANTIC SIMILARITY OF XML DOCUMENTS
    Qiu, Hong-Jun
    Yu, Wen-Jing
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1653 - +
  • [8] An improved method for classifying XML documents based on structure and content
    Zhang Na
    Zhang Dongzhan
    Yu Ye
    Duan Jiangjiao
    THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 426 - 430
  • [9] A Clustering Method Based on XML Schema Similarity
    Sun, Xia
    Wang, Hai-jun
    2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 1, 2011, : 340 - 343
  • [10] Novel mixed clustering method for XML documents
    College of Information and Communications Engineering, Harbin Engineering University, Harbin 150001, China
    不详
    Harbin Gongcheng Daxue Xuebao, 2007, 6 (697-701):