A STATISTICAL FRAMEWORK FOR TESTING FUNCTIONAL CATEGORIES IN MICROARRAY DATA

被引:49
作者
Barry, William T. [1 ]
Nobel, Andrew B. [2 ]
Wright, Fred A. [3 ]
机构
[1] Duke Univ, Med Ctr, Dept Biostat & Bioinformat, Durham, NC 27710 USA
[2] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
关键词
Differential expression; array permutation; bootstrap; Type; 1; error; power;
D O I
10.1214/07-AOAS146
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Ready access to emerging databases of gene annotation and functional pathways has shifted assessments of differential expression in DNA microarray studies from single genes to groups of genes with shared biological function. This paper takes a critical look at existing methods for assessing the differential expression of a group of genes (functional category), and provides some suggestions for improved performance. We begin by presenting a general framework, in which the set of genes in a functional category is compared to file complementary set of genes on the array. The framework includes tests for overrepresentation of a category within a list of significant genes, and methods that consider continuous measures, of differential expression. Existing tests are divided into two classes. Class 1 tests assume gene-specific measures of differential expression are independent, despite overwhelming evidence of positive correlation. Analytic and simulated results are presented that demonstrate Class 1 tests are strongly anti-conservative in practice. Class 2 tests account for gene correlation, typically through array permutation that by construction has proper Type 1 error control for the induced null. However, both Class 1 and Class 2 tests use a null hypothesis that all genes have file same degree of differential expression. We introduce a more sensible and general (Class 3) null Under which the profile of differential expression is the same within the category and complement. Under this broader null. Class 2 tests are shown to be conservative. We propose standard bootstrap methods for testing against the Class 3 null and demonstrate they provide valid Type 1 error control and more power than array permutation in simulated datasetsts and real microarray experiments.
引用
收藏
页码:286 / 315
页数:30
相关论文
共 34 条
[1]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]  
[Anonymous], 1998, INTRO BOOTSTRAP
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Significance analysis of functional categories in gene expression studies: a structured permutation approach [J].
Barry, WT ;
Nobel, AB ;
Wright, FA .
BIOINFORMATICS, 2005, 21 (09) :1943-1949
[5]  
BARRY WT, 2008, STAT FRAMEWORK TESTI, DOI DOI 10.1214/07-AOAS146SUPPA
[6]   GOstat: find statistically overrepresented Gene Ontologies within a group of genes [J].
Beissbarth, T ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (09) :1464-1465
[7]   Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression [J].
Ben-Shaul, Y ;
Bergman, H ;
Soreq, H .
BIOINFORMATICS, 2005, 21 (07) :1129-1137
[8]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[9]   T-profiler: scoring the activity of predefined groups of genes using gene expression data [J].
Boorsma, A ;
Foat, BC ;
Vis, D ;
Klis, F ;
Bussemaker, HJ .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W592-W595
[10]  
Casella G., 2002, STAT INFERENCE