Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

被引:13
作者
Aliferis, Constantin F. [1 ,2 ,3 ]
Statnikov, Alexander [2 ]
Tsamardinos, Ioannis [2 ,4 ]
Schildcrout, Jonathan S. [3 ]
Shepherd, Bryan E. [3 ]
Harrell, Frank E., Jr. [3 ]
机构
[1] NYU, Ctr Hlth Informat & Bioinformat, New York, NY 10003 USA
[2] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN USA
[3] Vanderbilt Univ, Dept Biostat, Nashville, TN USA
[4] Univ Crete, Dept Comp Sci, Iraklion, Greece
关键词
D O I
10.1371/journal.pone.0004922
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development. Methodology/Principal Findings: We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data. Conclusions/Significance: The findings of the present study have two important practical implications: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests.
引用
收藏
页数:7
相关论文
共 51 条
[1]  
Aliferis CF, 2006, CANCER INFORM, V2, P133
[2]  
[Anonymous], P 14 INT JT C ART IN
[3]  
[Anonymous], 2000, Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses
[4]  
Baldi P., 2002, DNA MICROARRAYS GENE
[5]   Standardizing global gene expression analysis between laboratories and across platforms [J].
Bammler, T ;
Beyer, RP ;
Bhattacharya, S ;
Boorman, GA ;
Boyles, A ;
Bradford, BU ;
Bumgarner, RE ;
Bushel, PR ;
Chaturvedi, K ;
Choi, D ;
Cunningham, ML ;
Dengs, S ;
Dressman, HK ;
Fannin, RD ;
Farun, FM ;
Freedman, JH ;
Fry, RC ;
Harper, A ;
Humble, MC ;
Hurban, P ;
Kavanagh, TJ ;
Kaufmann, WK ;
Kerr, KF ;
Jing, L ;
Lapidus, JA ;
Lasarev, MR ;
Li, J ;
Li, YJ ;
Lobenhofer, EK ;
Lu, X ;
Malek, RL ;
Milton, S ;
Nagalla, SR ;
O'Malley, JP ;
Palmer, VS ;
Pattee, P ;
Paules, RS ;
Perou, CM ;
Phillips, K ;
Qin, LX ;
Qiu, Y ;
Quigley, SD ;
Rodland, M ;
Rusyn, I ;
Samson, LD ;
Schwartz, DA ;
Shi, Y ;
Shin, JL ;
Sieber, SO ;
Slifer, S .
NATURE METHODS, 2005, 2 (05) :351-356
[6]   Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[7]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[8]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[9]   Molecular classification of Crohn's disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells [J].
Burczynski, ME ;
Peterson, RL ;
Twine, NC ;
Zuberek, KA ;
Brodeur, BJ ;
Casciotti, L ;
Maganti, V ;
Reddy, PS ;
Strahs, A ;
Immermann, F ;
Spinelli, W ;
Schwertschlag, U ;
Slager, AM ;
Cotreau, MM ;
Dorner, AJ .
JOURNAL OF MOLECULAR DIAGNOSTICS, 2006, 8 (01) :51-61
[10]   The use and analysis of microarray data [J].
Butte, A .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (12) :951-960