A progressive clustering algorithm to group the XML data by structural and semantic similarity

被引：16

作者：

Nayak, Richi ^{[1
]}

Tran, Tien ^{[1
]}

机构：

[1] Queensland Univ Technol, Sch Informat Syst, Brisbane, Qld, Australia

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2007年 / 21卷 / 04期

关键词：

XML; clustering; structure; semantic; heterogeneous;

D O I：

10.1142/S0218001407005648

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate.

引用

页码：723 / 743

页数：21

共 28 条

[1]

Abiteboul S., 1999, DATA WEB RELATIONS S

[2]

[Anonymous], 1997, SIGMOD WORKSH RES IS

[3] A methodology for clustering XML documents by structure [J].

Dalamagas, T ;

Cheng, T ;

Winkel, KJ ;

Sellis, T .

INFORMATION SYSTEMS, 2006, 31 (03) :187-228

[4]

DO HH, 2002, 28 VLDB HONG KONG CH

[5]

Fellbaum C, 1998, WORDNET ELECT LEXICA

[6] Fast detection of XML structural similarity [J].

Flesca, S ;

Manco, G ;

Masciari, E ;

Pontieri, L ;

Pugliese, A .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :160-175

[7]

GIUMCHIGLIA F, 2004, MEANING COORDINATION

[8]

Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00001-0]

[9] Data clustering: A review [J].

Jain, AK ;

Murty, MN ;

Flynn, PJ .

ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323

[10]

JEONG HH, 2004, 23 INT C CONC MOD SH

← 1 2 3 →