An empirical assessment of validation practices for molecular classifiers

被引:68
作者
Castaldi, Peter J. [1 ]
Dahabreh, Issa J. [1 ]
Ioannidis, John P. A. [1 ]
机构
[1] Stanford Univ, Sch Med, Stanford Prevent Res Ctr, Fac Med Sch, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
predictive medicine; genes; gene expression; proteomics; STAGE OVARIAN-CANCER; GENE-EXPRESSION DATA; BREAST-CANCER; CELL-CARCINOMA; PUBLISHED MICROARRAY; DIAGNOSTIC-TESTS; CROSS-VALIDATION; STATISTICS NOTES; ERROR ESTIMATION; META-REGRESSION;
D O I
10.1093/bib/bbq073
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21-61%) and 29% (IQR, 15-65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04-5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n=758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 84 条
[1]   Serum proteome profiling detects myelodysplastic syndromes and identifies CXC chemokine ligands 4 and 7 as markers for advanced disease [J].
Aivado, Manuel ;
Spentzos, Dimitrios ;
Germing, Ulrich ;
Alterovitz, Gil ;
Meng, Xiao-Ying ;
Grall, Franck ;
Giagounidis, Aristoteles A. N. ;
Klement, Giannoula ;
Steidl, Ulrich ;
Otu, Hasan H. ;
Czibere, Akos ;
Prall, Wolf C. ;
Iking-Konert, Christof ;
Shayne, Michelle ;
Ramoni, Marco F. ;
Gattermann, Norbert ;
Haas, Rainer ;
Mitsiades, Constantine S. ;
Fung, Eric T. ;
Libermann, Towia A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (04) :1307-1312
[2]   Statistics Notes - Interaction revisited: the difference between two estimates [J].
Altman, DG ;
Bland, JM .
BMJ-BRITISH MEDICAL JOURNAL, 2003, 326 (7382) :219-219
[3]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[4]   Exonic expression profiling of breast cancer and benign lesions: a retrospective analysis [J].
Andre, Fabrice ;
Michiels, Stefan ;
Dessen, Philippe ;
Scott, Veronique ;
Suciu, Voichita ;
Uzan, Catherine ;
Lazar, Vladimir ;
Lacroix, Ludovic ;
Vassal, Gilles ;
Spielmann, Marc ;
Vielh, Philippe ;
Delaloge, Suzette .
LANCET ONCOLOGY, 2009, 10 (04) :381-390
[5]  
[Anonymous], 2010, R LANG ENV STAT COMP
[6]   Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer [J].
Ayers, M ;
Symmans, WF ;
Stec, J ;
Damokosh, AI ;
Clark, E ;
Hess, K ;
Lecocke, M ;
Metivier, J ;
Booser, D ;
Ibrahim, N ;
Valero, V ;
Royce, M ;
Arun, B ;
Whitman, G ;
Ross, J ;
Sneige, N ;
Hortobagyi, GN ;
Pusztai, L .
JOURNAL OF CLINICAL ONCOLOGY, 2004, 22 (12) :2284-2293
[7]   Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments [J].
Baggerly, KA ;
Morris, JS ;
Coombes, KR .
BIOINFORMATICS, 2004, 20 (05) :777-U710
[8]   Statistics notes - The odds ratio [J].
Bland, JM ;
Altman, DG .
BRITISH MEDICAL JOURNAL, 2000, 320 (7247) :1468-1468
[9]   Cytosolic N-terminal arginine-based signals together with a luminal signal target a type II membrane protein to the plant ER [J].
Boulaflous, Aurelia ;
Saint-Jore-Dupas, Claude ;
Herranz-Gordo, Marie-Carmen ;
Pagny-Salehabadi, Sophie ;
Plasson, Carole ;
Garidou, Frederic ;
Kiefer-Meyer, Marie-Christine ;
Ritzenthaler, Christophe ;
Faye, Loic ;
Gomord, Veronique .
BMC PLANT BIOLOGY, 2009, 9
[10]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380