Bayesian Networks with Structural Restrictions: Parallelization, Performance, and Efficient Cross-Validation

被引:7
作者
Peng, Hao [1 ]
Jin, Zhe [1 ]
Miller, John A. [1 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
来源
2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017) | 2017年
关键词
Big data; Analytics; Data mining; Classification; Parallel programming; Bayesian networks;
D O I
10.1109/BigDataCongress.2017.11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian Network algorithms are widely applied in the fields of bioinformatics, document classification, big data, and marketing informatics. In this paper, several Bayesian Network algorithms are evaluated, including Naive Bayes, Tree Augmented Naive Bayes, k-BAN, and k-BAN with Order Swapping. The algorithms are implemented using Scala and compared with the bnlearn library in R and Weka. Several datasets with varying numbers of attributes and instances are used to test the accuracy and efficiency of the implementations of the algorithms provided by the three packages. When handling huge datasets, issues involving accuracy, efficiency, and serial vs. parallel execution become more critical and should be addressed. We implemented several parallel algorithms as well as an efficient way to perform cross-validations, resulting in significant speedups.
引用
收藏
页码:7 / 14
页数:8
相关论文
共 31 条
  • [1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [2] KEEL: a software tool to assess evolutionary algorithms for data mining problems
    Alcala-Fdez, J.
    Sanchez, L.
    Garcia, S.
    del Jesus, M. J.
    Ventura, S.
    Garrell, J. M.
    Otero, J.
    Romero, C.
    Bacardit, J.
    Rivas, V. M.
    Fernandez, J. C.
    Herrera, F.
    [J]. SOFT COMPUTING, 2009, 13 (03) : 307 - 318
  • [3] [Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565
  • [4] Bache K., 2013, UCI Machine Learning Repository
  • [5] Bouckaert RemcoR., 2004, BAYESIAN NETWORK CLA
  • [6] BUNTINE W, 1991, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P52
  • [7] Cheng J, 1999, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P101
  • [8] Chickering DM, 2004, J MACH LEARN RES, V5, P1287
  • [9] COOPER GF, 1992, MACH LEARN, V9, P309, DOI 10.1007/BF00994110
  • [10] Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks
    Friedman, N
    Koller, D
    [J]. MACHINE LEARNING, 2003, 50 (1-2) : 95 - 125