VARIABLE SELECTION FOR GENERAL INDEX MODELS VIA SLICED INVERSE REGRESSION

被引:40
作者
Jiang, Bo [1 ]
Liu, Jun S. [1 ]
机构
[1] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
Interactions; inverse models; sliced inverse regression; sure independence screening; variable selection; SUFFICIENT DIMENSION REDUCTION; EMBRYONIC STEM-CELLS; ORACLE PROPERTIES; EXPRESSION; LASSO; LIKELIHOOD; DISCOVERY; CANCER;
D O I
10.1214/14-AOS1233
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential variables under the general index model, in which the response is dependent of predictors through an unknown function of one or more linear combinations of them. Instead of building a predictive model of the response given combinations of predictors, we model the conditional distribution of predictors given the response. This inverse modeling perspective motivates us to propose a stepwise procedure based on likelihood-ratio tests, which is effective and computationally efficient in identifying important variables without specifying a parametric relationship between predictors and the response. For example, the proposed procedure is able to detect variables with pairwise, three-way or even higher-order interactions among p predictors with a computational time of O(p) instead of O(p(k)) (with k being the highest order of interactions). Its excellent empirical performance in comparison with existing methods is demonstrated through simulation studies as well as real data examples. Consistency of the variable selection procedure when both the number of predictors and the sample size go to infinity is established.
引用
收藏
页码:1751 / 1786
页数:36
相关论文
共 28 条
[1]   A LASSO FOR HIERARCHICAL INTERACTIONS [J].
Bien, Jacob ;
Taylor, Jonathan ;
Tibshirani, Robert .
ANNALS OF STATISTICS, 2013, 41 (03) :1111-1141
[2]  
Chen CH, 1998, STAT SINICA, V8, P289
[3]   Integration of external signaling pathways with the core transcriptional network in embryonic stem cells [J].
Chen, Xi ;
Xu, Han ;
Yuan, Ping ;
Fang, Fang ;
Huss, Mikael ;
Vega, Vinsensius B. ;
Wong, Eleanor ;
Orlov, Yuriy L. ;
Zhang, Weiwei ;
Jiang, Jianming ;
Loh, Yuin-Han ;
Yeo, Hock Chuan ;
Yeo, Zhen Xuan ;
Narang, Vipin ;
Govindarajan, Kunde Ramamoorthy ;
Leong, Bernard ;
Shahab, Atif ;
Ruan, Yijun ;
Bourque, Guillaume ;
Sung, Wing-Kin ;
Clarke, Neil D. ;
Wei, Chia-Lin ;
Ng, Huck-Hui .
CELL, 2008, 133 (06) :1106-1117
[4]   Stem cell transcriptome profiling via massive-scale mRNA sequencing [J].
Cloonan, Nicole ;
Forrest, Alistair R. R. ;
Kolle, Gabriel ;
Gardiner, Brooke B. A. ;
Faulkner, Geoffrey J. ;
Brown, Mellissa K. ;
Taylor, Darrin F. ;
Steptoe, Anita L. ;
Wani, Shivangi ;
Bethel, Graeme ;
Robertson, Alan J. ;
Perkins, Andrew C. ;
Bruce, Stephen J. ;
Lee, Clarence C. ;
Ranade, Swati S. ;
Peckham, Heather E. ;
Manning, Jonathan M. ;
McKernan, Kevin J. ;
Grimmond, Sean M. .
NATURE METHODS, 2008, 5 (07) :613-619
[5]   Fisher lecture: Dimension reduction in regression [J].
Cook, R. Dennis .
STATISTICAL SCIENCE, 2007, 22 (01) :1-26
[6]   Testing predictor contributions in sufficient dimension reduction [J].
Cook, RD .
ANNALS OF STATISTICS, 2004, 32 (03) :1062-1092
[7]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[8]   The sliced inverse regression algorithm as a maximum likelihood procedure [J].
Eugenia Szretter, Maria ;
Jaime Yohai, Victor .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (10) :3570-3578
[9]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[10]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360