A tree-based approach to clustering XML documents by structure

被引:0
|
作者
Costa, G
Manco, G
Ortale, R
Tagarelli, A
机构
[1] Inst Italian Natl Res Council, CNR, ICAR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.
引用
收藏
页码:137 / 148
页数:12
相关论文
共 50 条
  • [21] A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering
    Wang, Xiaochun
    Wang, Xiali
    Wilkes, Mitchell
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (07) : 945 - 958
  • [22] An efficient and scalable algorithm for clustering XML documents by structure
    Lian, W
    Cheung, DWL
    Mamoulis, N
    Yiu, SM
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (01) : 82 - 96
  • [23] Mining Tree-Based Frequent Patterns from XML
    Mazuran, Mirjana
    Quintarelli, Elisa
    Tanca, Letizia
    FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 287 - 299
  • [24] Tree-based clustering for Gaussian mixture HMMs
    Kato, Tsuneo
    Kuroiwa, Shingo
    Shimizu, Tohru
    Higuchi, Norio
    Systems and Computers in Japan, 2002, 33 (04) : 40 - 49
  • [25] Clustering schemaless XML documents
    Shen, Y
    Wang, B
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: COOPIS, DOA, AND ODBASE, 2003, 2888 : 767 - 784
  • [26] A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents
    Di Iorio, Angelo
    Schirinzi, Michele
    Vitali, Fabio
    Marchetti, Carlo
    ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 90 - +
  • [27] XML documents clustering by structures
    Nayak, Richi
    Xu, Sumei
    ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 432 - 442
  • [28] Semantic Clustering of XML Documents
    Tagarelli, Andrea
    Greco, Sergio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 28 (01)
  • [29] Collaborative clustering of XML documents
    Greco, Sergio
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2011, 77 (06) : 988 - 1008
  • [30] Clustering XML documents by patterns
    Piernik, Maciej
    Brzezinski, Dariusz
    Morzy, Tadeusz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 46 (01) : 185 - 212