A New Fast Vertical Method for Mining Frequent Patterns

被引:0
作者
Deng, Zhihong [1 ]
Wang, Zhonghui [1 ]
机构
[1] Peking Univ, Key Lab Machine Percept, Minist Educ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
来源
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS | 2010年 / 3卷 / 06期
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
data mining; frequent pattern mining; data structure; algorithm; CLOSED ITEMSETS; ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vertical mining methods are very effective for mining frequent patterns and usually outperform horizontal mining methods. However, the vertical methods become ineffective since the intersection time starts to be costly when the cardinality of tidset (tid-list or diffset) is very large or there are a very large number of transactions. In this paper, we propose a novel vertical algorithm called PPV for fast frequent pattern discovery. PPV works based on a data structure called Node-lists, which is obtained from a coding prefix-tree called PPC-tree. The efficiency of PPV is achieved with three techniques. First, the Node-list is much more compact compared with previous proposed vertical structure (such as tid-lists or diffsets) since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of support is transformed into the intersection of Node-lists and the complexity of intersecting two Node-lists can be reduced to O(m+n) by an efficient strategy, where m and n are the cardinalities of the two Node-lists respectively. Third, the ancestor-descendant relationship of two nodes, which is the basic step of intersecting Node-lists, can be very efficiently verified by Pre-Post codes of nodes. We experimentally compare our algorithm with FP-growth, and two prominent vertical algorithms (Eclat and dEclat) on a number of databases. The experimental results show that PPV is an efficient algorithm that outperforms FP-growth, Eclat, and dEclat.
引用
收藏
页码:733 / 744
页数:12
相关论文
共 16 条
[1]  
Agrawal R, P 1993 ACM SIGMOD IN, P207
[2]  
Agrawal R., P 1994 INT C VER LAR, P487
[3]  
GRUST T, P 2002 ACM SIGMOD IN, P109
[4]  
Han J, P 2000 ACM SIGMOD IN, P1
[5]   Mining frequent patterns without candidate generation: A frequent-pattern tree approach [J].
Han, JW ;
Pei, J ;
Yin, YW ;
Mao, RY .
DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) :53-87
[6]  
LIU G, P 2003 INT C DAT SYS, P65
[7]  
Park JS, 1997, IEEE T KNOWL DATA EN, V9, P813, DOI 10.1109/69.634757
[8]  
Qiao SJ, 2010, INT J COMPUT INT SYS, V3, P343
[9]  
SAVASERE A, P 1995 INT C VER LAR, P432
[10]  
SHENOY P, P 2000 ACM SIGMOD IN, P22