A progressive clustering algorithm to group the XML data by structural and semantic similarity

被引:16
作者
Nayak, Richi [1 ]
Tran, Tien [1 ]
机构
[1] Queensland Univ Technol, Sch Informat Syst, Brisbane, Qld, Australia
关键词
XML; clustering; structure; semantic; heterogeneous;
D O I
10.1142/S0218001407005648
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate.
引用
收藏
页码:723 / 743
页数:21
相关论文
共 28 条
[1]  
Abiteboul S., 1999, DATA WEB RELATIONS S
[2]  
[Anonymous], 1997, SIGMOD WORKSH RES IS
[3]   A methodology for clustering XML documents by structure [J].
Dalamagas, T ;
Cheng, T ;
Winkel, KJ ;
Sellis, T .
INFORMATION SYSTEMS, 2006, 31 (03) :187-228
[4]  
DO HH, 2002, 28 VLDB HONG KONG CH
[5]  
Fellbaum C, 1998, WORDNET ELECT LEXICA
[6]   Fast detection of XML structural similarity [J].
Flesca, S ;
Manco, G ;
Masciari, E ;
Pontieri, L ;
Pugliese, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :160-175
[7]  
GIUMCHIGLIA F, 2004, MEANING COORDINATION
[8]  
Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00001-0]
[9]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[10]  
JEONG HH, 2004, 23 INT C CONC MOD SH