Extend tree edit distance for effective object identification

被引:2
作者
Wang, Yue [1 ]
Wang, Hongzhi [1 ]
Zhang, Liyan [2 ]
Wang, Yang [1 ]
Li, Jianzhong [1 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[2] Univ Calif Irvine, Donald Bren Sch Informat & Comp Sci, Irvine, CA USA
关键词
XML data matching; New edit operations; Extended tree edit distance; k-Generation set distance; Cluster; XML; ALGORITHM;
D O I
10.1007/s10115-014-0816-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity join on XML documents which are usually modeled as rooted ordered labeled trees is widely applied, due to the ambiguity of references to the real-world objects. The conventional method dealing with this issue is based on tree edit distance, which is shortage of flexibility and efficiency. In this paper, we propose two novel edit operations together with extended tree edit distance, which can achieve good performance in similarity matching with hierarchical data structures [the run-time is in the worst case]. And then, we propose -generation set distance as a good approximation of the tree edit distance to further improve the join efficiency with quadric time complexity. Experiments on real and synthetic databases demonstrate the benefit of our method in efficiency and scalability.
引用
收藏
页码:629 / 656
页数:28
相关论文
共 25 条
  • [11] Integrating XML data sources using approximate joins
    Guha, Sudipto
    Jagadish, H. V.
    Koudas, Nick
    Srivastava, Divesh
    Yu, Ting
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2006, 31 (01): : 161 - 207
  • [12] Han Z, 2009, J COMPUT RES DEV S, P81
  • [13] Kailing K, 2004, LECT NOTES COMPUT SC, V2992, P676
  • [14] Klein P. N., 1998, Algorithms - ESA '98. 6th Annual European Symposium. Proceedings, P91
  • [15] Li F, 2010, LECT NOTES COMPUT SC, V6309, P3
  • [16] Li F, 2010, LECT NOTES COMPUT SC, V6185, P125, DOI 10.1007/978-3-642-16720-1_13
  • [17] Mozes S, 2008, SOME LOWER UPPER BOU
  • [18] NIERMAN A, 2002, P 5 INT WORKSH WEB D
  • [19] Shasha Dennis, 1995, APPROXIMATE TREE PAT, pChapter 14
  • [20] TREE-TO-TREE CORRECTION PROBLEM
    TAI, KC
    [J]. JOURNAL OF THE ACM, 1979, 26 (03) : 422 - 433