Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis

被引:158
作者
Wang, Lan [1 ]
Zhou, Jianhui [2 ]
Qu, Annie [3 ]
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[2] Univ Virginia, Dept Stat, Charlottesville, VA 22904 USA
[3] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
Correlated data; Diverging number of parameters; GEE; High-dimensional covariates; Longitudinal data; Marginal regression; variable selection; GENE-EXPRESSION DATA; VARIABLE SELECTION; MODEL SELECTION; DIVERGING NUMBER; LIKELIHOOD;
D O I
10.1111/j.1541-0420.2011.01678.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider the penalized generalized estimating equations (GEEs) for analyzing longitudinal data with high-dimensional covariates, which often arise in microarray experiments and large-scale health studies. Existing high-dimensional regression procedures often assume independent data and rely on the likelihood function. Construction of a feasible joint likelihood function for high-dimensional longitudinal data is challenging, particularly for correlated discrete outcome data. The penalized GEE procedure only requires specifying the first two marginal moments and a working correlation structure. We establish the asymptotic theory in a high-dimensional framework where the number of covariates pn increases as the number of clusters n increases, and pn can reach the same order as n. One important feature of the new procedure is that the consistency of model selection holds even if the working correlation structure is misspecified. We evaluate the performance of the proposed method using Monte Carlo simulations and demonstrate its application using a yeast cell-cycle gene expression data set.
引用
收藏
页码:353 / 360
页数:8
相关论文
共 25 条
[1]   Asymptotic results with generalized estimating equations for longitudinal data [J].
Balan, RM ;
Schiopu-Kratina, I .
ANNALS OF STATISTICS, 2005, 33 (02) :522-541
[2]   Variable selection for marginal longitudinal generalized linear models [J].
Cantoni, E ;
Flemming, JM ;
Ronchetti, E .
BIOMETRICS, 2005, 61 (02) :507-514
[3]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[4]  
Dziak J.J., 2009, FRONTIERS STAT NEW D, V1, P49
[5]  
Dziak J. J., 2006, THESIS PENNS STAT U
[6]   New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis [J].
Fan, JQ ;
Li, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (467) :710-723
[7]   Nonconcave penalized likelihood with a diverging number of parameters [J].
Fan, JQ ;
Peng, H .
ANNALS OF STATISTICS, 2004, 32 (03) :928-961
[8]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[9]  
FU W.J., 2003, BIOMETRICS, V35, P109
[10]   Variable selection using MM algorithms [J].
Hunter, DR ;
Li, RZ .
ANNALS OF STATISTICS, 2005, 33 (04) :1617-1642