PaWI: Parallel Weighted Itemset Mining by means of MapReduce

被引:5
作者
Baralis, Elena [1 ]
Cagliero, Luca [1 ]
Garza, Paolo [1 ]
Grimaudo, Luigi [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
来源
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015 | 2015年
关键词
H.2.8.b Clustering; classification; and association rules; H.2.8.d Data mining;
D O I
10.1109/BigDataCongress.2015.14
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Frequent itemset mining is an exploratory data mining technique that has fruitfully been exploited to extract recurrent co-occurrences between data items. Since in many application contexts items are enriched with weights denoting their relative importance in the analyzed data, pushing item weights into the itemset mining process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient in-memory weighted itemset mining algorithms are available in literature, there is a lack of parallel and distributed solutions which are able to scale towards Big Weighted Data. This paper presents a scalable frequent weighted itemset mining algorithm based on the MapReduce paradigm. To demonstrate its actionability and scalability, the proposed algorithm was tested on a real Big dataset collecting approximately 34 millions of reviews of Amazon items. Weights indicate the ratings given by users to the purchased items. The mined itemsets represent combinations of items that were frequently bought together with an overall rating above average.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 26 条
[1]  
Agrawal D., 2011, P 14 INT C EXT DAT T, P530, DOI DOI 10.1145/1951365.1951432
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Agrawal R., P 20 INT C VERY LARG
[4]  
[Anonymous], 2003, P 9 ACM SIGKDD INT C
[5]  
[Anonymous], 2012, P 21 ACM INT C INF K
[6]  
[Anonymous], 1997, P 3 KDD
[7]  
[Anonymous], 2004, OSDI 04
[8]  
Baralis E, 2012, P 27 ANN ACM S APPL, P782
[9]   Misleading generalized itemset mining in the cloud [J].
Baralis, Elena ;
Cagliero, Luca ;
Cerquitelli, Tania ;
Chiusano, Silvia ;
Garza, Paolo ;
Grimaudo, Luigi ;
Pulvirenti, Fabio .
2014 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA), 2014, :211-216
[10]   Infrequent Weighted Itemset Mining Using Frequent Pattern Growth [J].
Cagliero, Luca ;
Garza, Paolo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (04) :903-915