Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

被引:8
作者
Yu, Kun-Ming [1 ]
Liu, Sheng-Hui [2 ]
Zhou, Li-Wei [2 ]
Wu, Shu-Hao [1 ]
机构
[1] Chung Hua Univ, Dept Comp Sci & Informat Engn, Hsinchu, Taiwan
[2] Harbin Univ Sci & Technol, Sch Software, Harbin, Heilongjiang, Peoples R China
关键词
Apriori; Frequent Pattern Mining; Load Balancing; Multi-Core; Parallel Data Mining;
D O I
10.4018/IJGHPC.2015040106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large database. Numerous methods have been proposed to solve this problem, such as Apriori and FP-growth. These are regarded as fundamental frequent pattern mining methods. In addition, parallel computing architectures, such as an on-cloud platform, a grid system, multi-core and GPU platform, have been popular in data mining. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures. In this study, multi-core architectures were used as well as two high efficiency load balancing parallel data mining methods based on the Apriori algorithm. The main goal of the proposed algorithms was to reduce the massive number of duplicate candidates generated using previous methods. This goal was achieved for, in this detailed experimental study the algorithms performed better than the previous methods. The experimental results demonstrated that the proposed algorithms had dramatically reduced computation time when using more threads. Moreover, the observations showed that the workload was equally balanced among the computing units.
引用
收藏
页码:77 / 99
页数:23
相关论文
共 24 条
[1]  
Agrawal R., 1993, P 1993 ACM SIGMOD IN, P207
[2]  
Agrawal R., 1994, P 20 INT C VER LARG, VVolume 1215, P487
[3]  
ALMADEN I, QUEST SYNTHETIC DATA
[4]  
Bodon F, 2005, P 1 INT WORKSHOP OPE, P56, DOI [10.1145/1133905.1133913, DOI 10.1145/1133905.1133913]
[5]  
Chai S, 2007, 2007 INT C SERV SYST 2007 INT C SERV SYST, P1
[6]  
Fakhrahmad SM, 2011, J INF SCI ENG, V27, P511
[7]  
Goethals B., 2004, P INT C ACM SIGKDD E, V6, P109, DOI [10.1145/1007730.1007744, DOI 10.1145/1007730.1007744]
[8]   Fast algorithms for frequent itemset mining using FP-trees [J].
Grahne, G ;
Zhu, JF .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) :1347-1362
[9]  
Grossman R., 2008, P 14 ACM SIGKDD INT, P920, DOI DOI 10.1145/1401890.1402000
[10]  
Han JW, 2000, SIGMOD RECORD, V29, P1