Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

被引:0
作者
Pernkopf, Franz [1 ]
Bilmes, Jeff A. [2 ]
机构
[1] Graz Univ Technol, Dept Elect Engn, A-8010 Graz, Austria
[2] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
基金
奥地利科学基金会;
关键词
Bayesian networks; classification; discriminative learning; structure learning; graphical model; missing feature; CLASSIFICATION; INFORMATION; REGRESSION; COMPLEXITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a simple order-based greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respectively. Given an ordering, we can find a discriminative structure with O(Nk+1) score evaluations (where constant k is the tree-width of the sub-graph over the attributes). We present results on 25 data sets from the UCI repository, for phonetic classification using the TIMIT database, for a visual surface inspection task, and for two handwritten digit recognition tasks. We provide classification performance for both discriminative and generative parameter learning on both discriminatively and generatively structured networks. The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (greedy) Bayesian network learning approach, but does so with a factor of similar to 10-40 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features, a case where generative classifiers have an advantage over discriminative classifiers.
引用
收藏
页码:2323 / 2360
页数:38
相关论文
共 70 条
[21]   The sample complexity of learning fixed-structure Bayesian networks [J].
Dasgupta, S .
MACHINE LEARNING, 1997, 29 (2-3) :165-180
[22]  
de Campos LM, 2006, J MACH LEARN RES, V7, P2149
[23]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[24]   A MINIMUM DISCRIMINATION INFORMATION APPROACH FOR HIDDEN MARKOV MODELING [J].
EPHRAIM, Y ;
DEMBO, A ;
RABINER, LR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1989, 35 (05) :1001-1013
[25]   ON THE RELATIONS BETWEEN MODELING APPROACHES FOR SPEECH RECOGNITION [J].
EPHRAIM, Y ;
RABINER, LR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1990, 36 (02) :372-380
[26]   Bayesian network classifiers [J].
Friedman, N ;
Geiger, D ;
Goldszmidt, M .
MACHINE LEARNING, 1997, 29 (2-3) :131-163
[27]  
Friedman N, 1999, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P196
[28]   Knowledge representation and inference in similarity networks and Bayesian multinets [J].
Geiger, D ;
Heckerman, D .
ARTIFICIAL INTELLIGENCE, 1996, 82 (1-2) :45-74
[29]  
Greiner K, 2002, EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, P167
[30]   Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers [J].
Greiner, R ;
Su, XY ;
Shen, B ;
Zhou, W .
MACHINE LEARNING, 2005, 59 (03) :297-322