A new MapReduce associative classifier based on a new storage format for large-scale imbalanced data

被引:5
作者
Almasi, Mehrdad [1 ]
Abadeh, Mohammad Saniee [1 ]
机构
[1] Tarbiat Modares Univ, Fac Elect & Comp Engn, Tehran, Iran
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2018年 / 21卷 / 04期
关键词
MapReduce; Associative classifiers; Data storage format; Big data; COMPLEXITY-MEASURES; GENETIC ALGORITHM; PREDICTION; ENSEMBLE; RULES; MINE; SET;
D O I
10.1007/s10586-018-2812-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The process of knowledge discovery from big and high dimensional datasets has become a popular research topic. The classification problem is a key task in bioinformatics, business intelligence, decision science, astronomy, physics, etc. Building associative classifiers has been a notable research interest in recent years because of their superior accuracy. In associative classifiers, using under-sampling or over-sampling methods for imbalanced big datasets reduces accuracy or increases running time, respectively. Hence, there is a significant need to create efficient associative classifiers for imbalanced big data problems. These classifiers should be able to handle challenges such as memory usage, running time and efficiently exploring the search space. To this end, efficient calculation of measures is a primary objective for associative classifiers. In this paper, we propose a new efficient associative classifier for big imbalanced datasets. The proposed method is based on Rare-PEARs (a multi-objective evolutionary algorithm that efficiently discovers rare and reliable association rules) and is able to evaluate rules in a distributed manner by using a new storing data format. This format simplifies measures calculation and is fully compatible with the MapReduce programming model. We have applied the proposed method (RPII) on a well-known big dataset (ECBDL'14) and have compared our results with seven other learning methods. The experimental results show that RPII outperform other methods in sensitivity and final score measures (the values of sensitivity and final score measures were approximately 0.74 and 0.54 respectively). The results demonstrate that the proposed method is a good candidate for large-scale classification problems; furthermore, it achieves reasonable execution time when the target platform is a typical computer clusters.
引用
收藏
页码:1821 / 1847
页数:27
相关论文
共 85 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Ajlouni M.I. A., 2013, Eur. J. Bus. Manag, V5, P36
[3]   Modenar: Multi-objective differential evolution algorithm for mining numeric association rules [J].
Alatas, Bilal ;
Akin, Erhan ;
Karci, Ali .
APPLIED SOFT COMPUTING, 2008, 8 (01) :646-656
[4]   Rare-PEARs: A new multi objective evolutionary algorithm to mine rare and non-redundant quantitative association rules [J].
Almasi, Mehrdad ;
Abadeh, Mohammad Saniee .
KNOWLEDGE-BASED SYSTEMS, 2015, 89 :366-384
[5]  
[Anonymous], 2016, Pattern Recognition and Machine Learning
[6]  
[Anonymous], 2018, DISTRIB PARALLEL DAT
[7]  
[Anonymous], MATH PROB ENG
[8]  
[Anonymous], 2016, J SUPERCOMPUT, DOI DOI 10.1007/s11227-015-1501-1
[9]  
[Anonymous], 2017, J BIG DATA
[10]  
[Anonymous], 1995, EFFICIENT ALGORITHM