Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential variables under the general index model, in which the response is dependent of predictors through an unknown function of one or more linear combinations of them. Instead of building a predictive model of the response given combinations of predictors, we model the conditional distribution of predictors given the response. This inverse modeling perspective motivates us to propose a stepwise procedure based on likelihood-ratio tests, which is effective and computationally efficient in identifying important variables without specifying a parametric relationship between predictors and the response. For example, the proposed procedure is able to detect variables with pairwise, three-way or even higher-order interactions among p predictors with a computational time of O(p) instead of O(p(k)) (with k being the highest order of interactions). Its excellent empirical performance in comparison with existing methods is demonstrated through simulation studies as well as real data examples. Consistency of the variable selection procedure when both the number of predictors and the sample size go to infinity is established.
机构:
Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
Cornell Univ, Dept Stat Sci, Ithaca, NY 14853 USACornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
Bien, Jacob
;
Taylor, Jonathan
论文数: 0引用数: 0
h-index: 0
机构:
Stanford Univ, Dept Stat, Stanford, CA 94305 USACornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
Taylor, Jonathan
;
Tibshirani, Robert
论文数: 0引用数: 0
h-index: 0
机构:
Stanford Univ, Dept Stat, Stanford, CA 94305 USA
Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USACornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
机构:Univ Buenos Aires, Dept Matemat, RA-1428 Buenos Aires, DF, Argentina
Eugenia Szretter, Maria
;
Jaime Yohai, Victor
论文数: 0引用数: 0
h-index: 0
机构:
Univ Buenos Aires, Dept Matemat, RA-1428 Buenos Aires, DF, Argentina
Consejo Nacl Invest Cient & Tecn, RA-1033 Buenos Aires, DF, ArgentinaUniv Buenos Aires, Dept Matemat, RA-1428 Buenos Aires, DF, Argentina
机构:
Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
Cornell Univ, Dept Stat Sci, Ithaca, NY 14853 USACornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
Bien, Jacob
;
Taylor, Jonathan
论文数: 0引用数: 0
h-index: 0
机构:
Stanford Univ, Dept Stat, Stanford, CA 94305 USACornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
Taylor, Jonathan
;
Tibshirani, Robert
论文数: 0引用数: 0
h-index: 0
机构:
Stanford Univ, Dept Stat, Stanford, CA 94305 USA
Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USACornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
机构:Univ Buenos Aires, Dept Matemat, RA-1428 Buenos Aires, DF, Argentina
Eugenia Szretter, Maria
;
Jaime Yohai, Victor
论文数: 0引用数: 0
h-index: 0
机构:
Univ Buenos Aires, Dept Matemat, RA-1428 Buenos Aires, DF, Argentina
Consejo Nacl Invest Cient & Tecn, RA-1033 Buenos Aires, DF, ArgentinaUniv Buenos Aires, Dept Matemat, RA-1428 Buenos Aires, DF, Argentina