SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models

被引:53
作者
Saldana, Diego Franco [1 ]
Feng, Yang [1 ]
机构
[1] Columbia Univ, Dept Stat, New York, NY 10027 USA
关键词
Cox model; generalized linear models; penalized likelihood estimation; sparsity; sure independence screening; variable selection; GENERALIZED LINEAR-MODELS; VARIABLE SELECTION; PENALIZED LIKELIHOOD; CANCER; CLASSIFICATION; REGRESSION; REGULARIZATION; CRITERIA;
D O I
10.18637/jss.v083.i02
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We revisit sure independence screening procedures for variable selection in generalized linear models and the Cox proportional hazards model. Through the publicly available R package SIS, we provide a unified environment to carry out variable selection using iterative sure independence screening (ISIS) and all of its variants. For the regularization steps in the ISIS recruiting process, available penalties include the LASSO, SCAD, and MCP while the implemented variants for the screening steps are sample splitting, data-driven thresholding, and combinations thereof. Performance of these feature selection techniques is investigated by means of real and simulated data sets, where we find considerable improvements in terms of model selection and computational time between our algorithms and traditional penalized pseudo-likelihood methods applied directly to the full set of covariates.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 40 条
[1]  
[Anonymous], 2017, R LANG ENV STAT COMP
[2]  
[Anonymous], 2017, survival: Survival Analysis Routines for R. R package version 2.41.3
[3]  
[Anonymous], 1973, Inst Statist Math
[4]  
Bernau C, 2014, SURHD SYNTHESIS HIGH
[5]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[6]  
Breheny P, 2017, NCVREG REGULARIZATIO
[7]   COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION [J].
Breheny, Patrick ;
Huang, Jian .
ANNALS OF APPLIED STATISTICS, 2011, 5 (01) :232-253
[8]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[9]   PARTIAL LIKELIHOOD [J].
COX, DR .
BIOMETRIKA, 1975, 62 (02) :269-276
[10]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87