Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark

被引:30
作者
Arias, Jacinto [1 ]
Gamez, Jose A. [1 ]
Puerta, Jose M. [1 ]
机构
[1] Univ Castilla La Mancha, Dept Comp Syst, Albacete, Spain
关键词
Bayesian Network Classifiers; MapReduce; Big Data; High dimensionality; Apache Hadoop; Apache Spark;
D O I
10.1016/j.knosys.2016.06.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The challenge of scalability has always been a focus on Machine Learning research, where improved algorithms and new techniques are proposed in a constant basis to deal with more complex problems. With the advent of Big Data, this challenge has been intensified, in which new large scale datasets overwhelm the majority of available techniques. The community has resorted to Cloud Computing and distributed programming paradigms as the most immediate solution where Apache Spark has proven to be the most promising framework. In this paper we focus on the problem of supervised classification, exploring the family of the so called Bayesian Network Classifiers by studying their adaptability to the MapReduce and Apache Spark frameworks. We will analyse a range of algorithms and propose distributed versions of them. Our approach is based on a general framework for learning this probabilistic models from large scale and high dimensional data, the latter being a problem with less support in the literature. We also present an extensive experimental evaluation of our proposal over a wide set of problems and different elastic configurations of a computing cluster to show the full extent of the scalability properties of our framework. Additional material and the software to reproduce our experiments can be found on the supplementary website http://simd.albacete.org/supplements/distributed_bncs.html. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:16 / 26
页数:11
相关论文
共 30 条
[1]   Scaling up the Greedy Equivalence Search algorithm by constraining the search space of equivalence classes [J].
Alonso-Barba, Juan I. ;
delaOssa, Luis ;
Gamez, Jose A. ;
Puerta, Jose M. .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (04) :429-451
[2]  
[Anonymous], 2009, BAYESIAN NETWORKS DE
[3]  
[Anonymous], 2014, PROBABILISTIC REASON
[4]  
[Anonymous], IJCAI 2001 WORKSHOP
[5]  
[Anonymous], 2012, Mining of Massive Datasets
[6]  
[Anonymous], 2001, 3D DATA MANAGEMENT C
[7]  
[Anonymous], 2012, Hadoop: The definitive guide
[8]   Scalable Learning of k-dependence Bayesian Classifiers under MapReduce [J].
Arias, Jacinto ;
Gamez, Jose A. ;
Puerta, Jose M. .
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, :25-32
[9]   Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features [J].
Bacardit, Jaume ;
Widera, Pawel ;
Marquez-Chamorro, Alfonso ;
Divina, Federico ;
Aguilar-Ruiz, Jesus S. ;
Krasnogor, Natalio .
BIOINFORMATICS, 2012, 28 (19) :2441-2448
[10]   Discrete Bayesian Network Classifiers: A Survey [J].
Bielza, Concha ;
Larranaga, Pedro .
ACM COMPUTING SURVEYS, 2014, 47 (01)