Scalable out-of-core itemset mining

被引:5
作者
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Chiusano, Silvia [1 ]
Grand, Alberto [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy
关键词
Itemset mining; Data mining; INDEX SUPPORT; FREQUENT; ALGORITHMS;
D O I
10.1016/j.ins.2014.08.073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Itemset mining looks for correlations among data items in large transactional datasets. Traditional in-core mining algorithms do not scale well with huge data volumes, and are hindered by critical issues such as long execution times due to massive memory swap and main-memory exhaustion. This work is aimed at overcoming the scalability issues of existing in-core algorithms by improving their memory usage. A persistent structure, VLDBMine, to compactly store huge transactional datasets on disk and efficiently support large-scale itemset mining is proposed. VLDBMine provides a compact and complete representation of the data, by exploiting two different data structures suitable for diverse data distributions, and includes an appropriate indexing structure, allowing selective data retrieval. Experimental validation, performed on both real and synthetic datasets, shows the compactness of the VLDBMine data structure and the efficiency and scalability on large datasets of the mining algorithms supported by it. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:146 / 162
页数:17
相关论文
共 30 条
[1]   DRFP-tree: disk-resident frequent pattern tree [J].
Adnan, Muhaimenul ;
Alhajj, Reda .
APPLIED INTELLIGENCE, 2009, 30 (02) :84-97
[2]  
Agrawal N., 1993, IEEE TKDE, V5
[3]  
Agrawal Rakesh., 1994, P 20 INT C VER LARG, P487
[4]  
Baralis E, 2005, PROC INT CONF DATA, P754
[5]  
Baralis E., SAC 10, P1060
[6]  
Baralis E, 2013, I C DATA ENGIN WORKS, P266, DOI 10.1109/ICDEW.2013.6547461
[7]   IMine: Index Support for Item Set Mining [J].
Baralis, Elena ;
Cerquitelli, Tania ;
Chiusano, Silvia .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (04) :493-506
[8]  
Bashir S., IEEE INMIC 06, P237
[9]  
Buehrer G., KDD 06, P86
[10]   MAFIA: A maximal frequent itemset algorithm [J].
Burdick, D ;
Calimlim, M ;
Flannick, J ;
Gehrke, J ;
Yiu, TM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) :1490-1504