Learning Interactions via Hierarchical Group-Lasso Regularization

被引:186
作者
Lim, Michael [1 ]
Hastie, Trevor [2 ]
机构
[1] LinkedIn, Mountain View, CA 94043 USA
[2] Stanford Univ, Dept Stat, Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Logistic; Regression; Computer intensive; REGRESSION; SHRINKAGE; SELECTION;
D O I
10.1080/10618600.2014.938812
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a method for learning pairwise interactions in a linear regression or logistic regression model in a manner that satisfies strong hierarchy: whenever an interaction is estimated to be nonzero, both its associated main effects are also included in the model. We motivate our approach by modeling pairwise interactions for categorical variables with arbitrary numbers of levels, and then show how we can accommodate continuous variables as well. Our approach allows us to dispense with explicitly applying constraints on the main effects and interactions for identifiability, which results in interpretable interaction models. We compare our method with existing approaches on both simulated and real data, including a genome-wide association study, all using our R package glinternet.
引用
收藏
页码:627 / 654
页数:28
相关论文
共 20 条
[1]  
[Anonymous], 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.1553431
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]  
[Anonymous], COMPUTATIONAL LEARNI
[4]  
Bach F., 2008, Advances in Neural Information Processing Systems, V21, P105
[5]   A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].
Beck, Amir ;
Teboulle, Marc .
SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202
[6]   Templates for convex cone problems with applications to sparse signal recovery [J].
Becker S.R. ;
Candès E.J. ;
Grant M.C. .
Mathematical Programming Computation, 2011, 3 (3) :165-218
[7]   A LASSO FOR HIERARCHICAL INTERACTIONS [J].
Bien, Jacob ;
Taylor, Jonathan ;
Tibshirani, Robert .
ANNALS OF STATISTICS, 2013, 41 (03) :1111-1141
[8]   Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression [J].
Chen, Carla Chia-Ming ;
Schwender, Holger ;
Keith, Jonathan ;
Nunkesser, Robin ;
Mengersen, Kerrie ;
Macrossan, Paula .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (06) :1580-1591
[9]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[10]  
Koren Y, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P447