XProj: A Framework for Projected Structural Clustering of XML Documents

被引:0
|
作者
Aggarwal, Charu C. [1 ]
Feng, Jianhua [2 ]
Ta, Na [2 ]
Zaki, Mohammed [3 ]
Wang, Jianyong [2 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, 19 Skyline Dr, Hawthorne, NY 10532 USA
[2] Tsinghua Univ, Beijing, Peoples R China
[3] Rensselaer Polytech Inst, Troy, NY USA
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
关键词
XML; clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structural information in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.
引用
收藏
页码:46 / +
页数:3
相关论文
共 50 条
  • [1] A Framework for Clustering and Dynamic Maintenance of XML Documents
    Al-Shammari, Ahmed
    Liu, Chengfei
    Naseriparsa, Mehdi
    Bao Quoc Vo
    Anwar, Tarique
    Zhou, Rui
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 399 - 412
  • [2] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [3] Clustering XML documents using structural summaries
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 547 - 556
  • [4] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [5] Semantic Structural Similarity for Clustering XML Documents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    ICHIT 2008: INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 552 - 557
  • [6] Clustering XML documents based on structural similarity
    Xing, Guangming
    Xia, Zhonghang
    Guo, Jinhua
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 905 - +
  • [7] FXProj - A Fuzzy XML Documents Projected Clustering Based on Structure and Content
    Ji, Tengfei
    Bao, Xiaoyuan
    Yang, Dongqing
    ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 406 - 419
  • [8] Semantic Structural Similarity Measure for Clustering XML Documents
    Song, Ling
    Ma, Jun
    Lei, Jingsheng
    Zhang, Dongmei
    Wang, Zhen
    WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 232 - +
  • [9] Hierarchical clustering of XML documents focused on structural components
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    Ritacco, Ettore
    DATA & KNOWLEDGE ENGINEERING, 2013, 84 : 26 - 46
  • [10] Structural-based Clustering Technique OF XML Documents
    Posonia, Mary A.
    Jyothi, V. L.
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2013), 2013, : 1239 - 1242