Exploiting GPU and cluster parallelism in single scan frequent itemset mining

被引:41
作者
Djenouri, Youcef [1 ]
Djenouri, Djamel [2 ]
Belhadi, Asma [3 ]
Cano, Alberto [4 ]
机构
[1] Southern Denmark Univ, Dept Math & Comp Sci, IMADA, Odense, Denmark
[2] CERIST Res Ctr, DTISI, Algiers, Algeria
[3] USTHB, RIMA Lab, Algiers, Algeria
[4] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA USA
关键词
Frequent itemset mining; High-performance computing; Support computing; Big data; GPU;
D O I
10.1016/j.ins.2018.07.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper considers discovering frequent itemsets in transactional databases and addresses the time complexity problem by using high performance computing (HPC). Three HPC versions of the Single Scan (SS) algorithm are proposed. The first one (GSS) implements SS on a GPU (Graphics Processing Unit) architecture using an efficient mapping between thread blocks and the input data. The second approach (CSS) implements SS on a cluster architecture by scheduling independent jobs to workers in a cluster. The third, (CGSS) accelerates the frequent itemset mining process by using multiple cluster nodes equipped with GPUs. Moreover, three partitioning strategies are proposed to reduce GPU thread divergence and cluster load imbalance. Results show that CGSS outperforms SS, GSS, and CSS in terms of speedup. Specifically, CGSS provides up to a 350 times speedup for low minimum support values on large datasets. GCSS demonstrably outperforms the state-of-the-art HPC-based algorithms on big databases. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:363 / 377
页数:15
相关论文
共 43 条
[1]   Implementation of Association Rule Mining using CUDA [J].
Adil, Syed Hasan ;
Qamar, Sadaf .
ICET: 2009 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2009, :332-+
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
[Anonymous], 2009, P 5 INT WORKSHOP DAT, DOI DOI 10.1145/1565694.1565702
[4]   Two Maximum Entropy-Based Algorithms for Running Quantile Estimation in Nonstationary Data Streams [J].
Arandjelovic, Ognjen ;
Pham, Duc-Son ;
Venkatesh, Svetha .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (09) :1469-1479
[5]   An Iterative MapReduce Based Frequent Subgraph Mining Algorithm [J].
Bhuiyan, Mansurul A. ;
Al Hasan, Mohammad .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) :608-620
[6]   Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility [J].
Buyya, Rajkumar ;
Yeo, Chee Shin ;
Venugopal, Srikumar ;
Broberg, James ;
Brandic, Ivona .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (06) :599-616
[8]   High performance evaluation of evolutionary-mined association rules on GPUs [J].
Cano, Alberto ;
Maria Luna, Jose ;
Ventura, Sebastian .
JOURNAL OF SUPERCOMPUTING, 2013, 66 (03) :1438-1461
[9]  
Cryans Jean-Daniel, 2010, Proceedings of the Second International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2010), P185, DOI 10.1109/DBKDA.2010.34
[10]  
Cui Q., 2012, Proceedings of the 2nd International Conference on Green Communications and Networks, V224, P215