Sparse Partially Linear Additive Models

被引：43

作者：

Lou, Yin ^{[1
]}

Bien, Jacob ^{[2
,3
]}

Caruana, Rich ^{[4
]}

Gehrke, Johannes ^{[1
]}

机构：

[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA

[2] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14850 USA

[3] Cornell Univ, Dept Stat Sci, Ithaca, NY 14850 USA

[4] Microsoft Corp, Microsoft Res, Redmond, WA 98052 USA

来源：

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS | 2016年 / 25卷 / 04期

关键词：

Classification; Generalized partially linear additive models; Group lasso; Regression; Sparsity; SELECTION; REGRESSION; SHRINKAGE;

D O I：

10.1080/10618600.2015.1089775

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing each to have either a linear or nonlinear effect on the response. However, the choice of which features to treat as linear or nonlinear is typically assumed known. Thus, to make a GPLAM a viable approach in situations in which little is known a priori about the features, one must overcome two primary model selection challenges: deciding which features to include in the model and determining which of these features to treat nonlinearly. We introduce the sparse partially linear additive model (SPLAM), which combines model fitting and both of these model selection challenges into a single convex optimization problem. SPLAM provides a bridge between the lasso and sparse additive models. Through a statistical oracle inequality and thorough simulation, we demonstrate that SPLAM can outperform other methods across a broad spectrum of statistical regimes, including the high-dimensional (p >> N) setting. We develop efficient algorithms that are applied to real datasets with half a million samples and over 45,000 features with excellent predictive performance. Supplementary materials for this article are available online.

引用

页码：1026 / 1040

页数：15

共 39 条

[1]

[Anonymous], 2010, Proceedings of the 27th International Conference on International Conference on Machine Learning

[2]

[Anonymous], MATH PROGRAMMING COM

[3] A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].