A Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix

被引:2
作者
Zhang, Xue-Liang [1 ]
Yang, Ting [1 ]
Fan, Bao-Quan [1 ]
Wang, Xu [1 ]
Wei, Jin-Mao [1 ]
机构
[1] Nankai Univ, Coll Informat Tech Sci, Tianjin 300071, Peoples R China
来源
INTERNATIONAL CONFERENCE ON APPLIED PHYSICS AND INDUSTRIAL ENGINEERING 2012, PT B | 2012年 / 24卷
关键词
similarity; XML; semantic; structure; adjacency matrix;
D O I
10.1016/j.phpro.2012.02.215
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Similarity measurement of XML documents is crucial to meet various needs of approximate searches and document classifications in XML-oriented applications. Some methods have been proposed for this purpose. Nevertheless, few methods can be elegantly exploited to depict structure and semantic information and hence to effectively measure the similarity of XML documents. In this paper, we present a new method of computing the structure and semantic similarity of XML documents based on extended adjacency matrix(EAM). Different from a general adjacency matrix, in an EAM, the structure information of not only the adjacent layers but also the ancestor-descendant layers can be stored. For measuring the similarity of two XML documents, the proposed method firstly stores the structure and semantic information in two extended adjacency matrices (M-1,M-2). Then it computes similarity of the two documents through cos(M-1,M-2). Experimental results on bench-mark data show that the method holds high efficiency and accuracy. (C) 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of ICAPIE Organization Committee.
引用
收藏
页码:1452 / 1461
页数:10
相关论文
共 50 条
  • [21] Research on the Semantic Similarity Computation Method Based on EUO
    Shi, Hongxia
    [J]. THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 257 - 263
  • [22] A stream-based method to detect differences between XML documents
    Jang, Bumsuk
    Park, SeongHun
    Ha, Young-guk
    [J]. JOURNAL OF INFORMATION SCIENCE, 2017, 43 (01) : 39 - 53
  • [23] Fuzzy Semantic-Based String Similarity Experiments to Detect Plagiarism in Indonesian Documents
    Umareta, Chonan Firda Odayakana
    Mariyah, Siti
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
  • [24] Adjacency Matrix based Analysis on a Novel Metamorphic Manual Operation Sugarcane Loader
    Gao, Dezhong
    Cai, Ganwei
    Shi, Hui
    Pan, Yuchen
    [J]. ADVANCED DESIGN AND MANUFACTURING TECHNOLOGY III, PTS 1-4, 2013, 397-400 : 1529 - 1533
  • [25] The method of judging satisfactory consistency of linguistic judgment matrix based on adjacency matrix and 3-loop matrix
    Jin, Fengxia
    Wang, Feng
    Zhao, Kun
    Chen, Huatao
    Guirao, Juan L. G.
    [J]. AIMS MATHEMATICS, 2024, 9 (07): : 18944 - 18967
  • [26] A DYNAMIC TOPOLOGY MODELING METHOD IN WIND TUNNEL GROUP BASED ON ADJACENCY MATRIX
    Luo Changjun
    Ma Yongyi
    He Fu
    Fu Xuanli
    Ren Xingqian
    [J]. 2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [27] Approximate XML structure validation based on document-grammar tree similarity
    Tekli, Joe
    Chbeir, Richard
    Traina, Agma J. M.
    Traina, Caetano, Jr.
    Fileto, Renato
    [J]. INFORMATION SCIENCES, 2015, 295 : 258 - 302
  • [28] A New Method for Measuring Topological Structure Similarity between Complex Trajectories
    Wang, Huimeng
    Du, Yunyan
    Yi, Jiawei
    Sun, Yong
    Liang, Fuyuan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (10) : 1836 - 1848
  • [29] A novel sentence similarity measure for semantic-based expert systems
    Lee, Ming Che
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6392 - 6399
  • [30] The spatial arrangement method of measuring similarity can capture high-dimensional semantic structures
    Richie, Russell
    White, Bryan
    Bhatia, Sudeep
    Hout, Michael C.
    [J]. BEHAVIOR RESEARCH METHODS, 2020, 52 (05) : 1906 - 1928