VARIABLE SELECTION AND ESTIMATION IN HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS

被引:106
作者
Wei, Fengrong [1 ]
Huang, Jian [2 ]
Li, Hongzhe [3 ]
机构
[1] Univ W Georgia, Dept Math, Carrollton, GA 30118 USA
[2] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52242 USA
[3] Univ Penn, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
关键词
Basis expansion; group Lasso; high-dimensional data; nonparametric coefficient function; selection consistency; sparsity; SPLINE ESTIMATION; ADAPTIVE LASSO; CELL-CYCLE; REGRESSION; INFERENCE;
D O I
10.5705/ss.2009.316
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Nonparametric varying coefficient models are useful for studying the time-dependent effects of variables. Many procedures have been developed for estimation and variable selection in such models. However, existing work has focused on the case when the number of variables is fixed or smaller than the sample size. In this paper, we consider the problem of variable selection and estimation in varying coefficient models in sparse, high-dimensional settings when the number of variables can be larger than the sample size. We apply the group Lasso and basis function expansion to simultaneously select the important variables and estimate the nonzero varying coefficient functions. Under appropriate conditions, we show that the group Lasso selects a model of the right order of dimensionality, selects all variables with the norms of the corresponding coefficient functions greater than certain threshold level, and is estimation consistent. However, the group Lasso is in general not selection consistent and tends to select variables that are not important in the model. In order to improve the selection results, we apply the adaptive group Lasso. We show that, under suitable conditions, the adaptive group Lasso has the oracle selection property in the sense that it correctly selects important variables with probability converging to one. In contrast, the group Lasso does not possess such oracle property. Both approaches are evaluated using simulation and demonstrated on a data example.
引用
收藏
页码:1515 / 1540
页数:26
相关论文
共 34 条
[1]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[2]   Regularization of wavelet approximations - Rejoinder [J].
Antoniadis, A ;
Fan, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :964-967
[3]   Identifying cooperativity among transcription factors controlling the cell cycle in yeast [J].
Banerjee, N ;
Zhang, MQ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (23) :7024-7031
[4]   Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables [J].
Chiang, CT ;
Rice, JA ;
Wu, CO .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (454) :605-619
[5]  
Fan JQ, 2010, STAT SINICA, V20, P101
[6]   Two-step estimation of functional linear models with applications to longitudinal data [J].
Fan, JQ ;
Zhang, JT .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 :303-322
[7]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[8]   PATHWISE COORDINATE OPTIMIZATION [J].
Friedman, Jerome ;
Hastie, Trevor ;
Hoefling, Holger ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2007, 1 (02) :302-332
[9]   Penalized regressions: The bridge versus the lasso [J].
Fu, WJJ .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 1998, 7 (03) :397-416
[10]   Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data [J].
Hoover, DR ;
Rice, JA ;
Wu, CO ;
Yang, LP .
BIOMETRIKA, 1998, 85 (04) :809-822