An effective association rule mining scheme using a new generic basis

被引:0
作者
Jayakrushna Sahoo
Ashok Kumar Das
A. Goswami
机构
[1] Indian Institute of Technology,Department of Mathematics
[2] International Institute of Information Technology,Center for Security, Theory and Algorithmic Research
来源
Knowledge and Information Systems | 2015年 / 43卷
关键词
Data mining; Association rule mining; Condensed representations; Basis for association rules; Frequent closed itemset;
D O I
暂无
中图分类号
学科分类号
摘要
Association rule mining among itemsets is a fundamental task and is of great importance in many data mining applications including attacks in network data, stock market, financial applications, bioinformatics to find genetic disorders, etc. However, association rule extraction from a reasonable-sized database produces a large number of rules. As a result, many of them are redundant to other rules, and they are practically useless. To overcome this issue, methods for mining non-redundant rules are essentially required. To address such problem, we initially propose a definition for redundancy in sense of minimal knowledge and then a compact representation of non-redundant association rules which we call as compact informative generic basis. We also provide an improved version of the existing DCI_CLOSED algorithm (DCI_PLUS) to find out the frequent closed itemsets (FCI) with their minimal representative generators in combination with BitTable which represents a compact database form in a single scan of the original database. We further introduce an algorithm for constructing the compact informative generic basis from the FCI and their generators in an efficient way. We finally present an inference mechanism in which all association rules can be generated without accessing the database. Experiments are performed on the proposed method. The experimental results show that the proposed method outperforms the other existing related methods.
引用
收藏
页码:127 / 156
页数:29
相关论文
共 67 条
[1]  
Bastide Y(2000)Mining frequent patterns with counting inference SIGKDD Explor Newsl 2 66-75
[2]  
Taouil R(2006)On condensed representations of constrained frequent patterns Knowl Inf Syst 9 180-201
[3]  
Pasquier N(2003)Free-sets: a condensed representation of Boolean data for the approximation of frequency queries Data Min Knowl Disc 7 5-22
[4]  
Stumme G(2007)Non-derivable itemset mining Data Min Knowl Disc 14 171-206
[5]  
Lakhal L(2008)Effective elimination of redundant association rules Data Min Knowl Disc 16 221-249
[6]  
Bonchi F(2007)BitTableFI: an efficient mining frequent itemsets algorithm Knowl Based Syst 20 329-335
[7]  
Lucchese C(2005)Fast algorithms for frequent itemset mining using FP-trees IEEE Trans Knowl Data Eng 17 1347-1362
[8]  
Boulicaut JF(2008)Succinct minimal generators: theoretical foundations and applications Int J Found Comput Sci 19 271-296
[9]  
Bykowski A(2002)Pincer-search: an efficient algorithm for discovering the maximum frequent set IEEE Trans Knowl Data Eng 14 553-566
[10]  
Rigotti C(2008)A new concise representation of frequent itemsets using generators and a positive border Knowl Inf Syst 17 35-56