Variable selection via a combination of the L0 and L1 penalties

被引:57
作者
Liu, Yufeng [1 ]
Wu, Yichao [2 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, Carolina Ctr Genome Sci, Chapel Hill, NC 27599 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
mixed integer programming; regression; regularization; SVM; variable selection;
D O I
10.1198/106186007X255676
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection is an important aspect of high-dimensional statistical modeling, particularly in regression and classification. In the regularization framework, various penalty functions are used to perform variable selection by putting relatively large penalties on small coefficients. The L-1 penalty is a popular choice because of its convexity, but it produces biased estimates for the large coefficients. The L-0 penalty is attractive for variable selection because it directly penalizes the number of nonzero coefficients. However, the optimization involved is discontinuous and nonconvex, and therefore it is very challenging to implement. Moreover, its solution may not be stable. In this article, we propose a new penalty that combines the L-0 and L-1 penalties. We implement this new penalty by developing a global optimization algorithm using mixed integer programming (MIP). We compare this combined penalty with several other penalties via simulated examples as well as real applications. The results show that the new penalty outperforms both the L-0 and L-1 penalties in terms of variable selection while maintaining good prediction accuracy.
引用
收藏
页码:782 / 798
页数:17
相关论文
共 19 条