Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

被引:0
作者
Russell Greiner
Xiaoyuan Su
Bin Shen
Wei Zhou
机构
[1] University of Alberta,Department of Computing Science
[2] University of Miami,Electrical & Computer Engineering
[3] University of Alberta,Department of Computing Science
[4] University of Waterloo,School of Computer Science
来源
Machine Learning | 2005年 / 59卷
关键词
(Bayesian) belief nets; logistic regression; classification; PAC-learning; computational/sample complexity;
D O I
暂无
中图分类号
学科分类号
摘要
Bayesian belief nets (BNs) are often used for classification tasks—typically to return the most likely class label for each specified instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function—viz., likelihood, rather than classification accuracy—typically by first learning an appropriate graphical structure, then finding the parameters for that structure that maximize the likelihood of the data. As these parameters may not maximize the classification accuracy, “discriminative parameter learners” follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, shows how it extends standard logistic regression, and analyzes its inherent sample and computational complexity. We then present a general algorithm for this task, ELR, that applies to arbitrary BN structures and that works effectively even when given incomplete training data. Unfortunately, ELR is not guaranteed to find the parameters that optimize conditional likelihood; moreover, even the optimal-CL parameters need not have minimal classification error. This paper therefore presents empirical evidence that ELR produces effective classifiers, often superior to the ones produced by the standard “generative” algorithms, especially in common situations where the given BN-structure is incorrect.
引用
收藏
页码:297 / 322
页数:25
相关论文
共 24 条
[1]  
Binder J.(1997)Adaptive probabilistic networks with hidden variables Machine Learning 29 213-244
[2]  
Koller D.(1992)A Bayesian method for the induction of probabilistic networks from data Machine Learning 9 309-347
[3]  
Russell S.(1997)The sample complexity of Learning fixed-structure Bayesian networks Machine Learning 29 165-180
[4]  
Kanazawa K.(1976)Properties of diagnostic data distributions Biometrics 32 647-658
[5]  
Cooper G.(2001)The TM algorithm for maximising a conditional likelihood function Biometrika 88 961-972
[6]  
Herskovits E.(1997)Bayesian network classifiers Machine Learning Journal 29 131-163
[7]  
Dasgupta S.(1997)Wrappers for feature subset selection Artificial intelligence 97 1-2
[8]  
Dawid A. P.(1995)The EM algorithm for graphical association models with missing data Computational Statistics and Data Analysis 19 191-201
[9]  
Edwards D.(2003)Inference for the generalization error Machine Learning 52 239-281
[10]  
Lauritzen S.(2005)On discriminative Bayesian network classifiers and logistic regression Machine Learning 59 269-298