FXProj - A Fuzzy XML Documents Projected Clustering Based on Structure and Content

被引:0
作者
Ji, Tengfei [1 ]
Bao, Xiaoyuan [1 ]
Yang, Dongqing [1 ]
机构
[1] Peking Univ, Beijing 100871, Peoples R China
来源
ADVANCED DATA MINING AND APPLICATIONS, PT I | 2011年 / 7120卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML documents possess inherent semi-structured property, consisting of structural and content features. Most existing methods for XML documents clustering consider only one aspect of them. In this paper, we propose a fuzzy XML documents projected clustering algorithm, which can be used to cluster XML documents efficiently by combining the structural and content features. Another contribution is the adoption of some fuzzy techniques in a way that each frequent induced substructure has a fuzzy parameter associated with each cluster. Experimental results on both synthetic and real datasets show its effectiveness, especially when applying to large schemaless XML document collections.
引用
收藏
页码:406 / 419
页数:14
相关论文
共 16 条
[1]   Universal text preprocessing for data compression [J].
Abel, J ;
Teahan, W .
IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (05) :497-507
[2]  
[Anonymous], P 13 ACM SIGKDD INT
[3]  
Dalamagas T, 2004, LECT NOTES COMPUT SC, V3268, P547
[4]  
DOMENICONI C, 2004, P SIAM INT C DAT MIN
[5]  
Doucet A., 2002, INEX WORKSH, V2002, P81
[6]   A weighted common structure based clustering technique for XML documents [J].
Hwang, Jeong Hee ;
Ryu, Keun Ho .
JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) :1267-1274
[7]  
Kutty S., 2010, Proceedings 2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010), P1167, DOI 10.1109/ICDMW.2010.106
[8]  
Kutty S., 2009, Proceedings of the 18th acm conference on information and knowledge management, P1729
[9]  
KUTTY S, 2009, P 2009 ACM S DOC ENG, P94
[10]  
Lesniewska A., 2009, ADV DAT INF SYST ASS, P238