Ultra high-dimensional semiparametric longitudinal data analysis

被引:6
作者
Green, Brittany [1 ]
Lian, Heng [2 ]
Yu, Yan [3 ]
Zu, Tianhai [3 ]
机构
[1] Univ Louisville, Dept Comp Informat Syst, Louisville, KY 40292 USA
[2] City Univ Hong Kong, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
[3] Univ Cincinnati, Dept Operat Business Analyt & Informat Syst, Cincinnati, OH 45221 USA
关键词
generalized estimating equations; oracle property; polynomial spline; SCAD; single-index model; variable selection; SINGLE-INDEX MODELS; CELL-CYCLE; VARIABLE SELECTION; GENE-EXPRESSION; TRANSCRIPTION; IDENTIFICATION;
D O I
10.1111/biom.13348
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
As ultra high-dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially asexp(n1/2)with respect to the number of clustersn. We consider a flexible semiparametric approach, namely, partially linear single-index models, for ultra high-dimensional longitudinal data. Most importantly, we allow not only the partially linear covariates but also the single-index covariates within the unknown flexible function estimated nonparametrically to be ultra high dimensional. Using penalized generalized estimating equations, this approach can capture correlation within subjects, can perform simultaneous variable selection and estimation with a smoothly clipped absolute deviation penalty, and can capture nonlinearity and potentially some interactions among predictors. We establish asymptotic theory for the estimators including the oracle property in ultra high dimension for both the partially linear and nonparametric components, and we present an efficient algorithm to handle the computational challenges. We show the effectiveness of our method and algorithm via a simulation study and a yeast cell cycle gene expression data.
引用
收藏
页码:903 / 913
页数:11
相关论文
共 30 条
[1]   Cell-cycle control of gene expression in budding and fission yeast [J].
Bähler, J .
ANNUAL REVIEW OF GENETICS, 2005, 39 :69-94
[2]   Identifying cooperativity among transcription factors controlling the cell cycle in yeast [J].
Banerjee, N ;
Zhang, MQ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (23) :7024-7031
[3]   Generalized partially linear single-index models [J].
Carroll, RJ ;
Fan, JQ ;
Gijbels, I ;
Wand, MP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :477-489
[4]   SEMIPARAMETRIC GEE ANALYSIS IN PARTIALLY LINEAR SINGLE-INDEX MODELS FOR LONGITUDINAL DATA [J].
Chen, Jia ;
Li, Degui ;
Liang, Hua ;
Wang, Suojin .
ANNALS OF STATISTICS, 2015, 43 (04) :1682-1715
[5]   Systematic identification of cell cycle regulated transcription factors from microarray time series data [J].
Cheng, Chao ;
Li, Lei M. .
BMC GENOMICS, 2008, 9 (1)
[6]   NONPARAMETRIC INDEPENDENCE SCREENING AND STRUCTURE IDENTIFICATION FOR ULTRA-HIGH DIMENSIONAL LONGITUDINAL DATA [J].
Cheng, Ming-Yen ;
Honda, Toshio ;
Li, Jialiang ;
Peng, Heng .
ANNALS OF STATISTICS, 2014, 42 (05) :1819-1849
[7]   Interacting models of cooperative gene regulation [J].
Das, D ;
Banerjee, N ;
Zhang, MQ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (46) :16234-16239
[8]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[9]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[10]   Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data [J].
Hoover, DR ;
Rice, JA ;
Wu, CO ;
Yang, LP .
BIOMETRIKA, 1998, 85 (04) :809-822