A weighted common structure based clustering technique for XML documents

被引：11

作者：

Hwang, Jeong Hee ^{[2
]}

Ryu, Keun Ho ^{[1
]}

机构：

[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju 361763, Chungbuk, South Korea

[2] Namseoul Univ, Dept Comp Sci, Cheonan 331707, Chungnam, South Korea

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2010年 / 83卷 / 07期

关键词：

Data mining; XML mining; Document clustering; XML clustering;

D O I：

10.1016/j.jss.2010.02.004

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach. Crown Copyright (C) 2010 Published by Elsevier Inc. All rights reserved.

引用

页码：1267 / 1274

页数：8

共 32 条

[1]

AGGARWAL CC, 2007, ACM SIGKDD INT C KNO

[2]

[Anonymous], ACM COMPUTING SURVEY

[3]

ANTONELLIS P, 2008, ACM S APPL COMP SAC

[4]

ASAI T, 2002, SIAM INT C DAT MIN A

[5]

BRADLEY PS, 1998, INT C MACH LEARN WIS

[6]

COOK DJ, 1999, J INTELLIGENT INFORM, V5

[7]

COSTA G, 2004, EUR C PRINC PRACT KN

[8]

DALAMAGAS T, 2004, HEL C AISETN GREEC

[9]

DEWITT D, 2002, NIAGARA QUERY ENGINE

[10]

DIAZ AL, 1999, XML GENERATOR

← 1 2 3 4 →