Logistic Regression: From Art to Science

被引:56
作者
Bertsimas, Dimitris [1 ,2 ]
King, Angela [3 ]
机构
[1] MIT, Operat Res Ctr, Cambridge, MA 02139 USA
[2] MIT, Sloan Sch Management, Cambridge, MA 02139 USA
[3] End End Analyt, San Francisco, CA USA
关键词
Logistic regression; computational statistics; mixed integer nonlinear optimization; SUBSET-SELECTION; GROUP LASSO;
D O I
10.1214/16-STS602
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A high quality logistic regression model contains various desirable properties: predictive power, interpretability, significance, robustness to error in data and sparsity, among others. To achieve these competing goals, modelers incorporate these properties iteratively as they hone in on a final model. In the period 1991-2015, algorithmic advances in Mixed-Integer Linear Optimization (MILO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MILO problems. Motivated by this speedup, we propose modeling logistic regression problems algorithmically with a mixed integer nonlinear optimization (MINLO) approach in order to explicitly incorporate these properties in a joint, rather than sequential, fashion. The resulting MINLO is flexible and can be adjusted based on the needs of the modeler. Using both real and synthetic data, we demonstrate that the overall approach is generally applicable and provides high quality solutions in realistic timelines as well as a guarantee of suboptimality. When the MINLO is infeasible, we obtain a guarantee that imposing distinct statistical properties is simply not feasible.
引用
收藏
页码:367 / 384
页数:18
相关论文
共 55 条
[1]  
[Anonymous], 2014, PREPRINT
[2]  
[Anonymous], GNU LIN PROGR KIT
[3]  
[Anonymous], 2002, TECHNICAL REPORT
[4]  
[Anonymous], MODERN REGRESSION ME
[5]  
[Anonymous], 1996, ROBUST STAT DATA ANA
[6]  
[Anonymous], GUROBI OPTIMIZER REF
[7]  
[Anonymous], J MACH LEAR IN PRESS
[8]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[9]  
[Anonymous], INTRO GEN LINEAR MOD
[10]  
[Anonymous], 2012, REGRESSION ANAL EXAM