A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large Databases

被引:11
作者
Adnan, Muhaimenul [1 ]
Alhajj, Reda [1 ,2 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
[2] Global Univ, Dept Comp Sci, Beirut 155085, Lebanon
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2011年 / 41卷 / 01期
关键词
Association rules mining (ARM); FP-growth; FP-tree; frequent pattern mining; frequent patterns; index structures; secondary storage; virtual memory management (VMM); TREE;
D O I
10.1109/TSMCB.2010.2048900
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most of the existing methods to solve the problem of association rules mining (ARM) rely on special data structures to project the database (either totally or partially) in the primary memory. Traditionally, these data structures reside in the main memory and rely on the existing paging mechanism of the virtual memory manager (VMM) to handle the storage problem when they go out of the primary memory. Typically, VMM stores the overloaded data into the secondary memory based on some preassumed memory usage criteria. However, this direct and unplanned use of virtual memory results in an unpredictable behavior or thrashing, as depicted by some of the works described in the literature. This problem is tackled in this paper by presenting an ARM model capable of mining a transactional database, regardless of its size and without relying on the underlying VMM; the proposed approach could use only a bounded portion of the primary memory and this gives the opportunity to assign other parts of the main memory to other tasks with different priority. In other words, we propose a specialized memory management system which caters to the needs of the ARM model in such a way that the proposed data structure is constructed in the available allocated primary memory first. If at any point the structure grows out of the allocated memory quota, it is forced to be partially saved on secondary memory. The secondary memory version of the structure is accessed in a block-by-block basis so that both the spatial and temporal localities of the I/O access are optimized. Thus, the proposed framework takes control of the virtual memory access and hence manages the required virtual memory in an optimal way to the best benefit of the mining process to be served. Several clever data structures are used to facilitate these optimizations. Our method has the additional advantage that other tasks of different priorities may run concurrently with the main mining task with as little interference as possible because we do not rely on the default paging mechanism of the VMM. The reported test results demonstrate the applicability and effectiveness of the proposed approach.
引用
收藏
页码:154 / 172
页数:19
相关论文
共 31 条
[1]   DRFP-tree: disk-resident frequent pattern tree [J].
Adnan, Muhaimenul ;
Alhajj, Reda .
APPLIED INTELLIGENCE, 2009, 30 (02) :84-97
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Agrawal R., 1994, VLDB 1994, P487
[4]   A new and versatile method for association generation [J].
Amir, A ;
Feldman, R ;
Kashi, R .
INFORMATION SYSTEMS, 1997, 22 (6-7) :333-347
[5]  
[Anonymous], P 2 INT C KNOWL DISC
[6]  
[Anonymous], 2004, P 2004 ACM S APPL CO, DOI DOI 10.1145/967900.968012
[7]  
[Anonymous], P 10 ACM INT C KNOWL
[8]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775081
[9]  
BUEHRER G, 2006, P 12 ACM SIGKDD INT, P86
[10]   Constructing suffix tree for gigabyte sequences with megabyte memory [J].
Cheung, CF ;
Yu, JX ;
Lu, HJ .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (01) :90-105