Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

被引:1
作者
Wang, Limin [1 ,2 ]
Wang, Junjie [2 ]
Guo, Lu [2 ]
Li, Qilong [3 ]
机构
[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
[3] Jilin Univ, Coll Instrumentat & Elect Engn, Changchun 130012, Peoples R China
关键词
Bayesian network classifier; Attribute independence assumption; Ensemble learning; Log-likelihood function; Instance learning; NAIVE BAYES;
D O I
10.1007/s10489-023-05242-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Naive Bayes (NB) is one of the top ten machine learning algorithms whereas its attribute independence assumption rarely holds in practice. A feasible and efficient approach to improving NB is relaxing the assumption by adding augmented edges to the restricted topology of NB. In this paper we prove theoretically that the generalized topology may be a suboptimal solution to model multivariate probability distributions if its fitness to data cannot be measured. Thus we propose to apply log-likelihood function as the scoring function, then introduce an efficient heuristic search strategy to explore high-dependence relationships, and for each iteration the learned topology will be improved to fit data better. The proposed algorithm, called log-likelihood Bayesian classifier (LLBC), can respectively learn two submodels from labeled training set and individual unlabeled testing instance, and then make them work jointly for classification in the framework of ensemble learning. Our extensive experimental evaluations on 36 benchmark datasets from the University of California at Irvine (UCI) machine learning repository reveal that, LLBC demonstrates excellent classification performance and provides a competitive approach to learn from labeled and unlabeled data.
引用
收藏
页码:1957 / 1979
页数:23
相关论文
共 53 条
  • [1] Multiple Instance Learning with Genetic Pooling for medical data analysis
    Bhattacharjee, Kamanasish
    Pant, Millie
    Zhang, Yu-Dong
    Satapathy, Suresh Chandra
    [J]. PATTERN RECOGNITION LETTERS, 2020, 133 : 247 - 255
  • [2] A General Survey on Attention Mechanisms in Deep Learning
    Brauwers, Gianni
    Frasincar, Flavius
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3279 - 3298
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Attribute Selecting in Tree-Augmented Naive Bayes by Cross Validation Risk Minimization
    Chen, Shenglei
    Zhang, Zhonghui
    Liu, Linyuan
    [J]. MATHEMATICS, 2021, 9 (20)
  • [5] Label augmented and weighted majority voting for crowdsourcing
    Chen, Ziqi
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. INFORMATION SCIENCES, 2022, 606 : 397 - 409
  • [6] Chickering DM, 2004, J MACH LEARN RES, V5, P1287
  • [7] COOPER GF, 1992, MACH LEARN, V9, P309, DOI 10.1007/BF00994110
  • [8] Learning extended tree augmented naive structures
    de Campos, Cassio P.
    Corani, Giorgio
    Scanagatta, Mauro
    Cuccu, Marco
    Zaffalon, Marco
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 68 : 153 - 163
  • [9] de Campos LM, 2006, J MACH LEARN RES, V7, P2149
  • [10] Demsar J, 2006, J MACH LEARN RES, V7, P1