Maximizing the Efficiency of Parallel Apriori Algorithm

被引:4
作者
Shah, Ketan D. [1 ]
Mahajan, Sunita [2 ]
机构
[1] SVKMs NMIMS Univ, Dept Informat Technol, MPSTME, Mumbai, Maharashtra, India
[2] MET, Inst Comp Sci, Bombay, Maharashtra, India
来源
2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009) | 2009年
关键词
Data Mining; Association Rules; Parallel Mining; Apriori Algorithm; Frequent Item-set Generation; Load Balancing;
D O I
10.1109/ARTCom.2009.73
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we attempt to maximize the efficiency of the parallel Apriori Algorithm. The paper analyzes the performance of the algorithm over different datasets and over n processors on a commodity cluster of machines. In the Apriori Algorithm all processes need to synchronize after every pass. If any process is assigned more load than other processes in the system, the slowest process will dictate the speed of the program. It is therefore important to ensure that load is equally balanced among all processes. Our algorithm determines the no. of running processes and divides the load equally so as to maximize the system performance and its efficiency. The experiments conducted show that the parallel algorithm scales well to the number of processes and also improves on the efficiency by effective load balancing.
引用
收藏
页码:107 / +
页数:2
相关论文
共 6 条
[1]  
AGARWAL R, 1996, IBM RES REPORT
[2]  
Agarwal R., 1994, P 20 INT C VER LARG
[3]  
Agrawal Rakesh, 1993, P ACM SIGMOD C MAN D, P207, DOI DOI 10.1145/170035.170072
[4]  
[Anonymous], P SIGMOD 1995 INT C
[5]  
Cheung DW, 1996, PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, P31, DOI 10.1109/PDIS.1996.568665
[6]  
SHAH K, 2009, P INT C ADV COMP COM