Extend tree edit distance for effective object identification

被引:2
作者
Wang, Yue [1 ]
Wang, Hongzhi [1 ]
Zhang, Liyan [2 ]
Wang, Yang [1 ]
Li, Jianzhong [1 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[2] Univ Calif Irvine, Donald Bren Sch Informat & Comp Sci, Irvine, CA USA
关键词
XML data matching; New edit operations; Extended tree edit distance; k-Generation set distance; Cluster; XML; ALGORITHM;
D O I
10.1007/s10115-014-0816-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity join on XML documents which are usually modeled as rooted ordered labeled trees is widely applied, due to the ambiguity of references to the real-world objects. The conventional method dealing with this issue is based on tree edit distance, which is shortage of flexibility and efficiency. In this paper, we propose two novel edit operations together with extended tree edit distance, which can achieve good performance in similarity matching with hierarchical data structures [the run-time is in the worst case]. And then, we propose -generation set distance as a good approximation of the tree edit distance to further improve the join efficiency with quadric time complexity. Experiments on real and synthetic databases demonstrate the benefit of our method in efficiency and scalability.
引用
收藏
页码:629 / 656
页数:28
相关论文
共 25 条
  • [1] Element similarity measures in XML schema matching
    Algergawy, Alsayed
    Nayak, Richi
    Saake, Gunter
    [J]. INFORMATION SCIENCES, 2010, 180 (24) : 4975 - 4998
  • [2] Augsten N., 2005, P 31 INT C VERY LARG, P301
  • [3] Augsten N, 2010, ACM T DATABASE SYST, V35
  • [4] A survey on tree edit distance and related problems
    Bille, P
    [J]. THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) : 217 - 239
  • [5] Chawathe SS, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P90
  • [6] Demaine ED, 2007, LECT NOTES COMPUT SC, V4596, P146
  • [7] Dulucq S, 2003, LECT NOTES COMPUT SC, V2676, P83
  • [8] Flesca Sergio., 2002, In Proc. of the 5th Intl. Workshop on the Web and Databases, P55
  • [9] XML stream processing using tree-edit, distance embeddings
    Garofalakis, M
    Kumar, A
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2005, 30 (01): : 279 - 332
  • [10] Guha S, 2002, APPROXIMATE XML JOIN