Efficient frequent pattern mining based on Linear Prefix tree

被引:77
作者
Pyun, Gwangbum [1 ]
Yun, Unil [1 ]
Ryu, Keun Ho [2 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul, South Korea
[2] Chungbuk Natl Univ, Dept Comp Sci, Cheongju, South Korea
基金
新加坡国家研究基金会;
关键词
Data mining; Frequent pattern mining; Linear tree; Pattern growth; Knowledge discovery; SEQUENTIAL PATTERNS;
D O I
10.1016/j.knosys.2013.10.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outstanding frequent pattern mining guarantees both fast runtime and low memory usage with respect to various data with different types and sizes. However, it is hard to improve the two elements since runtime is inversely proportional to memory usage in general. Researchers have made efforts to overcome the problem and have proposed mining methods which can improve both through various approaches. Many of state-of-the-art mining algorithms use tree structures, and they create nodes independently and connect them as pointers when constructing their own trees. Accordingly, the methods have pointers for each node in the trees, which is an inefficient way since they should manage and maintain numerous pointers. In this paper, we propose a novel tree structure to solve the limitation. Our new structure, LP-tree (Linear Prefix - Tree) is composed of array forms and minimizes pointers between nodes. In addition, LP-tree uses minimum information required in mining process and linearly accesses corresponding nodes. We also suggest an algorithm applying LP-tree to the mining process. The algorithm is evaluated through various experiments, and the experimental results show that our approach outperforms previous algorithms in term of the runtime, memory, and scalability. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:125 / 139
页数:15
相关论文
共 44 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]   Interactive mining of high utility patterns over data streams [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Choi, Ho-Jin .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (15) :11979-11991
[3]   A framework for mining interesting high utility patterns with a strong frequency affinity [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Choi, Ho-Jin .
INFORMATION SCIENCES, 2011, 181 (21) :4878-4894
[4]  
Andres B., 2010, SOFTWARE PRACTICE EX, V35, P159
[5]  
[Anonymous], J COMPUT INF SYST
[6]  
[Anonymous], 2003, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
[7]   MAFIA: A maximal frequent itemset algorithm [J].
Burdick, D ;
Calimlim, M ;
Flannick, J ;
Gehrke, J ;
Yiu, TM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) :1490-1504
[8]   Comparative analysis of sequence weighting approaches for mining time-interval weighted sequential patterns [J].
Chang, Joong Hyuk ;
Park, Nam Hun .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (03) :3867-3873
[9]   SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows [J].
Chang, Lei ;
Wang, Tengjiao ;
Yang, Dongqing ;
Luan, Hua .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :83-+
[10]  
Chuancong Gao, 2011, Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM 2011), P1044, DOI 10.1109/ICDM.2011.61