Don't be afraid of simpler patterns

被引:0
作者
Bringmann, Bjorn [1 ]
Zimmermann, Albrecht [1 ]
De Raedt, Luc [1 ]
Nijssen, Siegfried [1 ]
机构
[1] Univ Freiburg, Machine Learning Lab, Inst Comp Sci, D-79110 Freiburg, Germany
来源
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS | 2006年 / 4213卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.
引用
收藏
页码:55 / 66
页数:12
相关论文
共 15 条
[1]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[2]  
[Anonymous], 2002, EFFICIENTLY MINING F, DOI DOI 10.1145/775047.775058
[3]   Experiments in predicting biodegradability [J].
Blockeel, H ;
Dzeroski, S ;
Kompare, B ;
Kramer, S ;
Pfahringer, B ;
Van Laer, W .
APPLIED ARTIFICIAL INTELLIGENCE, 2004, 18 (02) :157-181
[4]  
COHEN WW, 1995, INT C MACH LEARN, P115
[5]   Data mining in bioinformatics using Weka [J].
Frank, E ;
Hall, M ;
Trigg, L ;
Holmes, G ;
Witten, IH .
BIOINFORMATICS, 2004, 20 (15) :2479-2481
[6]   Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds [J].
Helma, C ;
Cramer, T ;
Kramer, S ;
De Raedt, L .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (04) :1402-1411
[7]  
Horvath T., 2004, ACM SIGKDD INT C KNO, P158
[8]   Substructure mining using elaborate chemical representation [J].
Kazius, J ;
Nijssen, S ;
Kok, J ;
Bäck, T ;
Ijzerman, AP .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) :597-605
[9]  
Kramer S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P136, DOI 10.1145/502512.502533
[10]  
MORISHITA S, 2000, PODS, P2262