An effective association rule mining scheme using a new generic basis

被引:20
作者
Sahoo, Jayakrushna [1 ]
Das, Ashok Kumar [2 ]
Goswami, A. [1 ]
机构
[1] Indian Inst Technol, Dept Math, Kharagpur 721302, W Bengal, India
[2] Int Inst Informat Technol, Ctr Secur Theory & Algorithm Res, Hyderabad 500032, Andhra Pradesh, India
关键词
Data mining; Association rule mining; Condensed representations; Basis for association rules; Frequent closed itemset; CONDENSED REPRESENTATION; EFFICIENT ALGORITHM; CLOSED ITEMSETS; FREQUENT; GENERATORS; PATTERNS;
D O I
10.1007/s10115-014-0732-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Association rule mining among itemsets is a fundamental task and is of great importance in many data mining applications including attacks in network data, stock market, financial applications, bioinformatics to find genetic disorders, etc. However, association rule extraction from a reasonable-sized database produces a large number of rules. As a result, many of them are redundant to other rules, and they are practically useless. To overcome this issue, methods for mining non-redundant rules are essentially required. To address such problem, we initially propose a definition for redundancy in sense of minimal knowledge and then a compact representation of non-redundant association rules which we call as compact informative generic basis. We also provide an improved version of the existing DCI_CLOSED algorithm (DCI_PLUS) to find out the frequent closed itemsets (FCI) with their minimal representative generators in combination with BitTable which represents a compact database form in a single scan of the original database. We further introduce an algorithm for constructing the compact informative generic basis from the FCI and their generators in an efficient way. We finally present an inference mechanism in which all association rules can be generated without accessing the database. Experiments are performed on the proposed method. The experimental results show that the proposed method outperforms the other existing related methods.
引用
收藏
页码:127 / 156
页数:30
相关论文
共 44 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[3]  
[Anonymous], 2012, Formal concept analysis: mathematical foundations
[4]  
[Anonymous], 2000, SIGMOD INT WORKSHOP
[5]  
[Anonymous], UCI MACHINE LEARNING
[6]  
Bastide Y., 2000, SIGKDD EXPLORATIONS, V2, P66, DOI DOI 10.1145/380995.381017
[7]   DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets [J].
Bay Vo ;
Hong, Tzung-Pei ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7196-7206
[8]   A new generic basis of "factual" and "implicative" association rules [J].
Ben Yahia, Sadok ;
Gasmi, Ghada ;
Nguifo, Engelbert Mephu .
INTELLIGENT DATA ANALYSIS, 2009, 13 (04) :633-656
[9]   On condensed representations of constrained frequent patterns [J].
Bonchi, F ;
Lucchese, C .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (02) :180-201
[10]   Free-sets: A condensed representation of Boolean data for the approximation of frequency queries [J].
Boulicaut, JF ;
Bykowski, A ;
Rigotti, C .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (01) :5-22