Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

被引:0
作者
Pernkopf, Franz [1 ]
Bilmes, Jeff A. [2 ]
机构
[1] Graz Univ Technol, Dept Elect Engn, A-8010 Graz, Austria
[2] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
基金
奥地利科学基金会;
关键词
Bayesian networks; classification; discriminative learning; structure learning; graphical model; missing feature; CLASSIFICATION; INFORMATION; REGRESSION; COMPLEXITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a simple order-based greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respectively. Given an ordering, we can find a discriminative structure with O(Nk+1) score evaluations (where constant k is the tree-width of the sub-graph over the attributes). We present results on 25 data sets from the UCI repository, for phonetic classification using the TIMIT database, for a visual surface inspection task, and for two handwritten digit recognition tasks. We provide classification performance for both discriminative and generative parameter learning on both discriminatively and generatively structured networks. The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (greedy) Bayesian network learning approach, but does so with a factor of similar to 10-40 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features, a case where generative classifiers have an advantage over discriminative classifiers.
引用
收藏
页码:2323 / 2360
页数:38
相关论文
共 70 条
[1]   Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs [J].
Acid, S ;
De Campos, LM ;
Castellano, JG .
MACHINE LEARNING, 2005, 59 (03) :213-235
[2]  
[Anonymous], 1995, 12 INT C MACH LEARN
[3]  
[Anonymous], 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]
[4]  
[Anonymous], 1999, Learning in Graphical Models
[5]  
[Anonymous], 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
[6]  
[Anonymous], 2006, Pattern recognition and machine learning
[7]   COMPLEXITY OF FINDING EMBEDDINGS IN A K-TREE [J].
ARNBORG, S ;
CORNEIL, DG ;
PROSKUROWSKI, A .
SIAM JOURNAL ON ALGEBRAIC AND DISCRETE METHODS, 1987, 8 (02) :277-284
[8]  
Bahl L. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P49
[9]   Convexity, classification, and risk bounds [J].
Bartlett, PL ;
Jordan, MI ;
McAuliffe, JD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156
[10]  
Bernardo J., 2007, BAYESIAN STAT, V8, P3, DOI DOI 10.1007/978-3-642-93220-5_6