XML data mining

被引:6
作者
Romei, Andrea [1 ]
Turini, Franco [1 ]
机构
[1] Univ Pisa, Dept Comp Sci, I-56127 Pisa, Italy
关键词
data mining; knowledge discovery; XML; XQuery; query language; inductive database; ITEMSETS;
D O I
10.1002/spe.944
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the spreading, of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general-purpose query language in Support of mining XML data. In our framework, raw data, mining models and domain knowledge are represented by way of XML documents and stored inside native XML databases. Data mining (DM) tasks are expressed in an extension of XQuery. Special attention is given to the frequent pattern discovery problem, and a way of exploiting domain-dependent optimization,,, and efficient data Structures as deeper as possible in the extraction process is presented. We report the results of a first bunch of experiments, showing that a good trade-off between expressiveness and efficiency in XML DM is not a chimera. Copyright (C) 2009 John Wiley & Sons, Ltd.
引用
收藏
页码:101 / 130
页数:30
相关论文
共 43 条
[1]  
AGRAWAL R, 1994, P 20 INT C VER LARG, V12, P487
[2]  
[Anonymous], P ACM SIGMOD INT C M
[3]  
[Anonymous], EXTENSIBLE BUSINESS
[4]  
[Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
[5]  
Bentayeb F, 2002, LECT NOTES ARTIF INT, V2366, P423
[6]   ExAnte: A processing method for frequent-pattern mining [J].
Bonchi, F ;
Giannotti, F ;
Mazzanti, A ;
Pedreschi, D .
IEEE INTELLIGENT SYSTEMS, 2005, 20 (03) :25-31
[7]   Extending the state-of-the-art of constraint-based pattern discovery [J].
Bonchi, Francesco ;
Lucchese, Claudio .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (02) :377-399
[8]  
BOTAN I, 2007, P 33 INT C VER LARG, P75
[9]   A tool for extracting XML association rules [J].
Braga, D ;
Campi, A ;
Ceri, S ;
Klemettinen, M ;
Lanzi, PL .
14TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, :57-64
[10]   DualMiner: A dual-pruning algorithm for itemsets with constraints [J].
Bucila, C ;
Gehrke, J ;
Kifer, D ;
White, W .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (03) :241-272