Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods

被引：47

作者：

Fridley, Brooke L. ^{[1
]}

Jenkins, Gregory D. ^{[1
]}

Biernacka, Joanna M. ^{[1
]}

机构：

[1] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA

来源：

PLOS ONE | 2010年 / 5卷 / 09期

关键词：

COMBINING P-VALUES; MICROARRAY DATA; INDEPENDENT TESTS; GLOBAL TEST; ASSOCIATION; VARIANCE; MODEL;

D O I：

10.1371/journal.pone.0012693

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.

引用

页码：1 / 9

页数：9

共 34 条

[1] Pathway analysis of Microarray data via regression [J].

Adewale, A. J. ;

Dinu, I. ;

Potter, J. D. ;

Liu, Q. ;

Yasui, Y. .

JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (03) :269-277

[2] Microarray data analysis: from disarray to consolidation and consensus [J].

Allison, DB ;

Cui, XQ ;

Page, GP ;

Sabripour, M .

NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65

[3]

[Anonymous], 1932, STAT METHODS RES WOR

[4]

[Anonymous], 1984, Analysis of survival data

[5]

[Anonymous], 1980, Multivariate Analysis

[6] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].

Bolstad, BM ;

Irizarry, RA ;

Åstrand, M ;

Speed, TP .

BIOINFORMATICS, 2003, 19 (02) :185-193

[7] GLOSSI: a method to assess the association of genetic loci-sets with complex diseases [J].

Chai, High-Seng ;

Sicotte, Hugues ;

Bailey, Kent R. ;

Turner, Stephen T. ;

Asmann, Yan W. ;

Kocher, Jean-Pierre A. .

BMC BIOINFORMATICS, 2009, 10

[8] Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association [J].

Chen, Yi-Hau ;

Chatterjee, Nilanjan ;

Carroll, Raymond J. .

BIOSTATISTICS, 2008, 9 (01) :81-99

[9] DAVID: Database for annotation, visualization, and integrated discovery [J].

Dennis, G ;

Sherman, BT ;

Hosack, DA ;

Yang, J ;

Gao, W ;

Lane, HC ;

Lempicki, RA .

GENOME BIOLOGY, 2003, 4 (09)

[10] Improving gene set analysis of microarray data by SAM-GS [J].

Dinu, Irina ;

Potter, John D. ;

Mueller, Thomas ;

Liu, Qi ;

Adewale, Adeniyi J. ;

Jhangri, Gian S. ;

Einecke, Gunilla ;

Famulski, Konrad S. ;

Halloran, Philip ;

Yasui, Yutaka .

BMC BIOINFORMATICS, 2007, 8 (1)

← 1 2 3 4 →