Bayesian Networks with Structural Restrictions: Parallelization, Performance, and Efficient Cross-Validation

被引:7
作者
Peng, Hao [1 ]
Jin, Zhe [1 ]
Miller, John A. [1 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
来源
2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017) | 2017年
关键词
Big data; Analytics; Data mining; Classification; Parallel programming; Bayesian networks;
D O I
10.1109/BigDataCongress.2017.11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian Network algorithms are widely applied in the fields of bioinformatics, document classification, big data, and marketing informatics. In this paper, several Bayesian Network algorithms are evaluated, including Naive Bayes, Tree Augmented Naive Bayes, k-BAN, and k-BAN with Order Swapping. The algorithms are implemented using Scala and compared with the bnlearn library in R and Weka. Several datasets with varying numbers of attributes and instances are used to test the accuracy and efficiency of the implementations of the algorithms provided by the three packages. When handling huge datasets, issues involving accuracy, efficiency, and serial vs. parallel execution become more critical and should be addressed. We implemented several parallel algorithms as well as an efficient way to perform cross-validations, resulting in significant speedups.
引用
收藏
页码:7 / 14
页数:8
相关论文
共 31 条
[21]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[22]  
Rish I, 2001, IJCAI 2001 WORK EMPI, V3, DOI DOI 10.1002/9781118721957.CH4
[23]   MODELING BY SHORTEST DATA DESCRIPTION [J].
RISSANEN, J .
AUTOMATICA, 1978, 14 (05) :465-471
[24]  
Ruggeri F., 2007, ENCY STAT QUALITY RE
[25]  
Sacha J., JBNC BAYESIAN NETWOR
[26]   ESTIMATING DIMENSION OF A MODEL [J].
SCHWARZ, G .
ANNALS OF STATISTICS, 1978, 6 (02) :461-464
[27]   Learning Bayesian Networks with the bnlearn R Package [J].
Scutari, Marco .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 35 (03) :1-22
[28]   Towards Big Data Bayesian Network Learning - an Ensemble Learning Based Approach [J].
Tang, Yan ;
Wang, Yu ;
Li, Ling ;
Cooper, Kendra M. L. .
2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, :355-357
[29]  
Teyssier Marc, 2012, ARXIV12071429
[30]  
Verma T., 1988, INFLUENCE DIAGRAMS D