Gene set analysis using variance component tests

被引:17
作者
Huang, Yen-Tsung [1 ]
Lin, Xihong [2 ]
机构
[1] Brown Univ, Dept Epidemiol, Providence, RI 02912 USA
[2] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
EXPRESSION; ASSOCIATION;
D O I
10.1186/1471-2105-14-210
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. Results: We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). Conclusion: We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
引用
收藏
页数:13
相关论文
共 22 条
[1]   APPROXIMATE INFERENCE IN GENERALIZED LINEAR MIXED MODELS [J].
BRESLOW, NE ;
CLAYTON, DG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :9-25
[2]   Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test [J].
Cai, Tianxi ;
Lin, Xihong ;
Carroll, Raymond J. .
BIOSTATISTICS, 2012, 13 (04) :776-790
[3]   Global functional profiling of gene expression [J].
Draghici, S ;
Khatri, P ;
Martins, RP ;
Ostermeier, GC ;
Krawetz, SA .
GENOMICS, 2003, 81 (02) :98-104
[4]   ON TESTING THE SIGNIFICANCE OF SETS OF GENES [J].
Efron, Bradley ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2007, 1 (01) :107-129
[5]   Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases [J].
Emmert-Streib, Frank ;
Glazko, Galina V. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (05)
[6]   Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets [J].
Glazko, Galina V. ;
Emmert-Streib, Frank .
BIOINFORMATICS, 2009, 25 (18) :2348-2354
[7]   Analyzing gene expression data in terms of gene sets:: methodological issues [J].
Goeman, Jelle J. ;
Buehlmann, Peter .
BIOINFORMATICS, 2007, 23 (08) :980-987
[8]   A global test for groups of genes: testing association with a clinical outcome [J].
Goeman, JJ ;
van de Geer, SA ;
de Kort, F ;
van Houwelingen, HC .
BIOINFORMATICS, 2004, 20 (01) :93-99
[9]  
Klebanov L, 2007, BIOINFORMATICS, V22, P2373
[10]   RANDOM-EFFECTS MODELS FOR LONGITUDINAL DATA [J].
LAIRD, NM ;
WARE, JH .
BIOMETRICS, 1982, 38 (04) :963-974