XML schema clustering with semantic and hierarchical similarity measures

被引:32
作者
Nayak, Richi [1 ]
Iryadi, Wina [1 ]
机构
[1] Queensland Univ Technol, Sch Informat Sci, Brisbane, Qld 4001, Australia
关键词
clustering; data mining; document mining; XML; semi-structured data; semantic similarity; structural similarity; schema matching;
D O I
10.1016/j.knosys.2006.08.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:336 / 349
页数:14
相关论文
共 33 条
[1]  
Abiteboul S., 1999, DATA WEB RELATIONS S
[2]  
AGRAWAL R, 1996, 5 INT C EXT DAT TECH
[3]  
[Anonymous], 2002, ICDE
[4]  
Berkhin P., 2002, SURVEY CLUSTERING DA
[5]   A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications [J].
Bertino, E ;
Guerrini, G ;
Mesiti, M .
INFORMATION SYSTEMS, 2004, 29 (01) :23-46
[6]  
Boag S., XQUERY 1 0 XML QUERY
[7]  
BOUKOTTAYA A, 2005 ACM S DOC ENG B
[8]  
Chi Y, 2005, FUND INFORM, V66, P161
[9]  
DO HH, 2002, 28 VLDB HON KONG CHI
[10]  
DOAN A, ACM SIGMOD SANT BARB