Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods

被引:47
作者
Fridley, Brooke L. [1 ]
Jenkins, Gregory D. [1 ]
Biernacka, Joanna M. [1 ]
机构
[1] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA
关键词
COMBINING P-VALUES; MICROARRAY DATA; INDEPENDENT TESTS; GLOBAL TEST; ASSOCIATION; VARIANCE; MODEL;
D O I
10.1371/journal.pone.0012693
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 34 条
[1]   Pathway analysis of Microarray data via regression [J].
Adewale, A. J. ;
Dinu, I. ;
Potter, J. D. ;
Liu, Q. ;
Yasui, Y. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (03) :269-277
[2]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[3]  
[Anonymous], 1932, STAT METHODS RES WOR
[4]  
[Anonymous], 1984, Analysis of survival data
[5]  
[Anonymous], 1980, Multivariate Analysis
[6]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[7]   GLOSSI: a method to assess the association of genetic loci-sets with complex diseases [J].
Chai, High-Seng ;
Sicotte, Hugues ;
Bailey, Kent R. ;
Turner, Stephen T. ;
Asmann, Yan W. ;
Kocher, Jean-Pierre A. .
BMC BIOINFORMATICS, 2009, 10
[8]   Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association [J].
Chen, Yi-Hau ;
Chatterjee, Nilanjan ;
Carroll, Raymond J. .
BIOSTATISTICS, 2008, 9 (01) :81-99
[9]   DAVID: Database for annotation, visualization, and integrated discovery [J].
Dennis, G ;
Sherman, BT ;
Hosack, DA ;
Yang, J ;
Gao, W ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (09)
[10]   Improving gene set analysis of microarray data by SAM-GS [J].
Dinu, Irina ;
Potter, John D. ;
Mueller, Thomas ;
Liu, Qi ;
Adewale, Adeniyi J. ;
Jhangri, Gian S. ;
Einecke, Gunilla ;
Famulski, Konrad S. ;
Halloran, Philip ;
Yasui, Yutaka .
BMC BIOINFORMATICS, 2007, 8 (1)