Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

被引:3
作者
Li, Shun [1 ]
Wang, Biao [1 ]
机构
[1] Univ Int Relat, Sch Informat Sci & Technol, Beijing 100091, Peoples R China
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2018年 / 90卷 / 8-9期
关键词
Bayesian network; Structure learning; MapReduce; Hybrid learning; ALGORITHM; PARALLEL;
D O I
10.1007/s11265-017-1275-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian Network (BN) is the popular and important data-mining model for representing uncertain knowledge. Much work has been done on migrating the BN structure learning algorithms, such as constraint-based (CB) and score-and-search-based (SSB) ones, to the MapReduce framework. But this approach is not suitable for hybrid algorithms, which have to conduct the Map and Reduce operation for all the data to get the scores, but not just the scores of the data in the pruned structures as in the traditional centralized version of hybrid algithm. This means the most time-comsuming part of the algorithm, the Map operations, will be run twice, once in CB and once in SSB. So in the MapReduce framework, when facing massive data, the simple migration of the traditional hybrid algorithm is almost equivalent to executing the CB and SSB sequentially, with little advantage. In this paper, we introduce a distributed hybrid BN structure learning algorithm. By using constraints and search methods that require the same data basis, the algorithm only needs to conduct the Map operation only once, in the CB stage, to prepare the data for the calculation of constraints and scores. Then it reuses intermediate results of constraints calculation in the SSB stage without Mapping the whole data again, thus greatly simplified the computing work. Experiment results show that the efficiency of the algorithm is more than doubled compared to the SSB, and the accuracy is improved by about 36% compared to the CB.
引用
收藏
页码:1115 / 1121
页数:7
相关论文
共 28 条
[1]  
[Anonymous], 2010, ARTIF INTELL
[2]   Structural Learning of Bayesian Networks Via Constrained Hill Climbing Algorithms: Adjusting Trade-off between Efficiency and Accuracy [J].
Arias, Jacinto ;
Gamez, Jose A. ;
Puerta, Jose M. .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2015, 30 (03) :292-325
[3]  
BRADSKI G., 2007, NIPS, P281
[4]  
Brenner E., 2013, P 20 9 C UNCERTAINTY, P112
[5]   A Method for Integrating Expert Knowledge When Learning Bayesian Networks From Data [J].
Cano, Andres ;
Masegosa, Andres R. ;
Moral, Serafin .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (05) :1382-1394
[6]  
Chen W., 2013, Mathematical Problems in Engineering, Volume, V2013, P1, DOI DOI 10.1016/JJC0NRE1.2013.01.001
[7]   Learning Bayesian networks from data: An information-theory based approach [J].
Cheng, J ;
Greiner, R ;
Kelly, J ;
Bell, D ;
Liu, WR .
ARTIFICIAL INTELLIGENCE, 2002, 137 (1-2) :43-90
[8]  
Cheng J., 2011, POWER CONSTRUCTOR SY
[9]  
Chickering D., 1996, Learning from data: Artificial intelligence and statistics V, P121, DOI 10.1007/978-1-4612-2404-4_12
[10]  
COOPER GF, 1992, MACH LEARN, V9, P309, DOI 10.1007/BF00994110