Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

被引：1

作者：

Wang, Limin ^{[1
,2
]}

Wang, Junjie ^{[2
]}

Guo, Lu ^{[2
]}

Li, Qilong ^{[3
]}

机构：

[1] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[2] Jilin Univ, Coll Software, Changchun 130012, Peoples R China

[3] Jilin Univ, Coll Instrumentat & Elect Engn, Changchun 130012, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 02期

关键词：

Bayesian network classifier; Attribute independence assumption; Ensemble learning; Log-likelihood function; Instance learning; NAIVE BAYES;

D O I：

10.1007/s10489-023-05242-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Naive Bayes (NB) is one of the top ten machine learning algorithms whereas its attribute independence assumption rarely holds in practice. A feasible and efficient approach to improving NB is relaxing the assumption by adding augmented edges to the restricted topology of NB. In this paper we prove theoretically that the generalized topology may be a suboptimal solution to model multivariate probability distributions if its fitness to data cannot be measured. Thus we propose to apply log-likelihood function as the scoring function, then introduce an efficient heuristic search strategy to explore high-dependence relationships, and for each iteration the learned topology will be improved to fit data better. The proposed algorithm, called log-likelihood Bayesian classifier (LLBC), can respectively learn two submodels from labeled training set and individual unlabeled testing instance, and then make them work jointly for classification in the framework of ensemble learning. Our extensive experimental evaluations on 36 benchmark datasets from the University of California at Irvine (UCI) machine learning repository reveal that, LLBC demonstrates excellent classification performance and provides a competitive approach to learn from labeled and unlabeled data.

引用

页码：1957 / 1979

页数：23

共 53 条

[1] Multiple Instance Learning with Genetic Pooling for medical data analysis
Bhattacharjee, Kamanasish
Pant, Millie
Zhang, Yu-Dong
Satapathy, Suresh Chandra
[J]. PATTERN RECOGNITION LETTERS, 2020, 133 : 247 - 255
[2] A General Survey on Attention Mechanisms in Deep Learning
Brauwers, Gianni
Frasincar, Flavius
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3279 - 3298
[3] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[4] Attribute Selecting in Tree-Augmented Naive Bayes by Cross Validation Risk Minimization
Chen, Shenglei
Zhang, Zhonghui
Liu, Linyuan
[J]. MATHEMATICS, 2021, 9 (20)
[5] Label augmented and weighted majority voting for crowdsourcing
Chen, Ziqi
Jiang, Liangxiao
Li, Chaoqun
[J]. INFORMATION SCIENCES, 2022, 606 : 397 - 409
[6] Chickering DM, 2004, J MACH LEARN RES, V5, P1287
[7] COOPER GF, 1992, MACH LEARN, V9, P309, DOI 10.1007/BF00994110
[8] Learning extended tree augmented naive structures
de Campos, Cassio P.
Corani, Giorgio
Scanagatta, Mauro
Cuccu, Marco
Zaffalon, Marco
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 68 : 153 - 163
[9] de Campos LM, 2006, J MACH LEARN RES, V7, P2149
[10] Demsar J, 2006, J MACH LEARN RES, V7, P1

← 1 2 3 4 5 6 →