Effect of data skewness in parallel mining of association rules

被引:0
作者
Cheung, DW [1 ]
Xiao, YQ [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong
来源
RESEARCH AND DEVELOPMENT IN KNOWLEDGE DISCOVERY AND DATA MINING | 1998年 / 1394卷
关键词
association rules; data mining; data skewness; parallel computing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.
引用
收藏
页码:48 / 60
页数:13
相关论文
共 16 条
[1]   Parallel mining of association rules [J].
Agrawal, R ;
Shafer, JC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :962-969
[2]  
Agrawal R, 1993, P 1993 ACM SIGM INT
[3]  
[Anonymous], P PYOC ACM SIGMOD IN
[4]  
[Anonymous], PROC
[5]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[6]  
Cheung DW, 1996, PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, P31, DOI 10.1109/PDIS.1996.568665
[7]  
CHEUNG DW, 1996, P 12 INT C DAT ENG N
[8]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[9]  
HAN EH, 1997, P ACM SIGMOD INT C M, P277
[10]  
HAN J, 1995, P 21 VLDB C ZUR SWIT