Group SCAD regression analysis for microarray time course gene expression data

被引:207
作者
Wang, Lifeng
Chen, Guang
Li, Hongzhe [1 ]
机构
[1] Univ Penn, Sch Med, Genome Computat Biol Grad Grp, Philadelphia, PA 19104 USA
[2] Univ Penn, Sch Med, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
[3] Univ Penn, Sch Med, Dept Bioengn, Philadelphia, PA 19104 USA
关键词
D O I
10.1093/bioinformatics/btm125
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since many important biological systems or processes are dynamic systems, it is important to study the gene expression patterns over time in a genomic scale in order to capture the dynamic behavior of gene expression. Microarray technologies have made it possible to measure the gene expression levels of essentially all the genes during a given biological process. In order to determine the transcriptional factors (TFs) involved in gene regulation during a given biological process, we propose to develop a functional response model with varying coefficients in order to model the transcriptional effects on gene expression levels and to develop a group smoothly clipped absolute deviation (SCAD) regression procedure for selecting the TFs with varying coefficients that are involved in gene regulation during a biological process. Results: Simulation studies indicated that such a procedure is quite effective in selecting the relevant variables with time-varying coefficients and in estimating the coefficients. Application to the yeast cell cycle microarray time course gene expression data set identified 19 of the 21 known TFs related to the cell cycle process. In addition, we have identified another 52 TFs that also have periodic transcriptional effects on gene expression during the cell cycle process. Compared to simple linear regression (SLR) analysis at each time point, our procedure identified more known cell cycle related TFs. Conclusions: The proposed group SCAD regression procedure is very effective for identifying variables with time-varying coefficients, in particular, for identifying the TFs that are related to gene expression over time. By identifying the TFs that are related to gene expression variations over time, the procedure can potentially provide more insight into the gene regulatory networks.
引用
收藏
页码:1486 / 1494
页数:9
相关论文
共 22 条
[1]   Identifying cooperativity among transcription factors controlling the cell cycle in yeast [J].
Banerjee, N ;
Zhang, MQ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (23) :7024-7031
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Regulatory element detection using correlation with expression [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
NATURE GENETICS, 2001, 27 (02) :167-171
[4]   Clustering of genes into regulons using integrated modeling-COGRIM [J].
Chen, Guang ;
Jensen, Shane T. ;
Stoeckert, Christian J., Jr. .
GENOME BIOLOGY, 2007, 8 (01)
[5]   Integrating regulatory motif discovery and genome-wide expression analysis [J].
Conlon, EM ;
Liu, XS ;
Lieb, JD ;
Liu, JS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3339-3344
[6]   Adaptively inferring human transcriptional subnetworks [J].
Das, Debopriya ;
Nahle, Zaher ;
Zhang, Michael Q. .
MOLECULAR SYSTEMS BIOLOGY, 2006, 2 (1) :14P
[7]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[8]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[9]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[10]   Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data [J].
Gao, F ;
Foat, BC ;
Bussemaker, HJ .
BMC BIOINFORMATICS, 2004, 5 (1)