Forward Variable Selection for Sparse Ultra-High Dimensional Varying Coefficient Models

被引:43
作者
Cheng, Ming-Yen [1 ]
Honda, Toshio [2 ]
Zhang, Jin-Ting [3 ]
机构
[1] Natl Taiwan Univ, Dept Math, Taipei 106, Taiwan
[2] Hitotsubashi Univ, Grad Sch Econ, Tokyo, Japan
[3] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore, Singapore
关键词
BIC; B-spline; Independence screening; Marginal model; Sub-Gaussion error; BAYESIAN INFORMATION CRITERION; ORACLE PROPERTIES; LONGITUDINAL DATA; DANTZIG SELECTOR; ADDITIVE-MODELS; FEATURE SPACE; REGRESSION; LASSO; SHRINKAGE; LIKELIHOOD;
D O I
10.1080/01621459.2015.1080708
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Varying coefficient models have numerous applications in a wide scope of scientific areas. While enjoying nice interpretability, they also allow for flexibility in modeling dynamic impacts of the covariates. But, in the new era of big data, it is challenging to select the relevant variables when the dimensionality is very large. Recently, several works are focused on this important problem based on sparsity assumptions; they are subject to some limitations, however. We introduce an appealing forward selection procedure. It selects important variables sequentially according to a reduction in sum of squares criterion and it employs a Bayesian information criterion (BIC)-based stopping rule. Clearly, it is simple to implement and fast to compute, and possesses many other desirable properties from theoretical and numerical viewpoints. The BIC is a special case of the extended BIC (EBIC) when an extra tuning parameter in the latter vanishes. We establish rigorous screening consistency results when either BIC or EBIC is used as the stopping criterion. The theoretical, results depend on some conditions on the eigenvalues related to the design matrices, which can be relaxed in some situations. Results of an extensive simulation study and a real data example are also presented to show the efficacy and usefulness of our procedure. Supplementary materials for this article are available online.
引用
收藏
页码:1209 / 1221
页数:13
相关论文
共 39 条
[1]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[2]   Variable Selection in Varying-Coefficient Models Using P-Splines [J].
Antoniadis, Anestis ;
Gijbels, Irene ;
Verhasselt, Anneleen .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2012, 21 (03) :638-661
[3]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[4]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[5]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[6]   NONPARAMETRIC INDEPENDENCE SCREENING AND STRUCTURE IDENTIFICATION FOR ULTRA-HIGH DIMENSIONAL LONGITUDINAL DATA [J].
Cheng, Ming-Yen ;
Honda, Toshio ;
Li, Jialiang ;
Peng, Heng .
ANNALS OF STATISTICS, 2014, 42 (05) :1819-1849
[7]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[8]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[9]   Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models [J].
Fan, Jianqing ;
Ma, Yunbei ;
Dai, Wei .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) :1270-1284
[10]   STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION [J].
Fan, Jianqing ;
Xue, Lingzhou ;
Zou, Hui .
ANNALS OF STATISTICS, 2014, 42 (03) :819-849