Dual coordinate descent methods for logistic regression and maximum entropy models

被引:255
作者
Yu, Hsiang-Fu [1 ]
Huang, Fang-Lan [1 ]
Lin, Chih-Jen [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci, Taipei 106, Taiwan
关键词
Logistic regression; Maximum entropy; Coordinate descent optimization; Linear classification; ALGORITHM;
D O I
10.1007/s10994-010-5221-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most optimization methods for logistic regression or maximum entropy solve the primal problem. They range from iterative scaling, coordinate descent, quasi-Newton, and truncated Newton. Less efforts have been made to solve the dual problem. In contrast, for linear support vector machines (SVM), methods have been shown to be very effective for solving the dual problem. In this paper, we apply coordinate descent methods to solve the dual form of logistic regression and maximum entropy. Interestingly, many details are different from the situation in linear SVM. We carefully study the theoretical convergence as well as numerical issues. The proposed method is shown to be faster than most state of the art methods for training logistic regression and maximum entropy.
引用
收藏
页码:41 / 75
页数:35
相关论文
共 33 条
[1]  
[Anonymous], 1999, Athena scientific Belmont
[2]  
[Anonymous], 2008, P 25 INT C MACH LEAR
[3]  
BALDRIDGE J, 2001, OPENNLP PACKAGE
[4]  
Chang KW, 2008, J MACH LEARN RES, V9, P1369
[5]  
Collins M, 2008, J MACH LEARN RES, V9, P1775
[6]  
CRAMMER K, 2000, COMPUTATIONAL LEARNI, P35
[7]   GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS [J].
DARROCH, JN ;
RATCLIFF, D .
ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05) :1470-&
[8]   Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[9]  
FA RE, 2005, J MACHINE LEARNING R, V6, P1889
[10]  
Gao Jianfeng., 2007, P 45 ANN M ASS COMPU, V45, P824