On selection biases with prediction rules formed from gene expression data

被引:12
作者
Zhu, J. X. [1 ,2 ]
McLachlan, G. J. [1 ,2 ,3 ]
Ben-Tovim Jones, L. [1 ,2 ,3 ]
Wood, I. A. [4 ]
机构
[1] Univ Queensland, Dept Math, St Lucia, Qld 4072, Australia
[2] Univ Queensland, ARC Ctr Bioinformat, St Lucia, Qld 4072, Australia
[3] Univ Queensland, Inst Mol Biosci, St Lucia, Qld 4072, Australia
[4] QUT Gardens Point, Sch Math Sci, Brisbane, Qld 4001, Australia
基金
澳大利亚研究理事会; 英国医学研究理事会;
关键词
gene expression data; selection bias; discriminant analysis; support vector machine; cross-validation;
D O I
10.1016/j.jspi.2007.06.003
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
There has been ever increasing interest in the use of microarray experiments as a basis for the provision of prediction (discriminant) rules for improved diagnosis of cancer and other diseases. Typically, the microarray cancer studies provide only a limited number of tissue samples from the specified classes of tumours or patients, whereas each tissue sample may contain the expression levels of thousands of genes. Thus researchers are faced with the problem of forming a prediction rule on the basis of a small number of classified tissue samples, which are of very high dimension. Usually, some form of feature (gene) selection is adopted in the formation of the prediction rule. As the subset of genes used in the final form of the rule have not been randomly selected but rather chosen according to some criterion designed to reflect the predictive power of the rule, there will be a selection bias inherent in estimates of the error rates of the rules if care is not taken. We shall present various situations where selection bias arises in the formation of a prediction rule and where there is a consequent need for the correction of this bias. We describe the design of cross-validation schemes that are able to correct for the various selection biases. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:374 / 386
页数:13
相关论文
共 18 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]   Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival [J].
Chang, HY ;
Nuyten, DSA ;
Sneddon, JB ;
Hastie, T ;
Tibshirani, R ;
Sorlie, T ;
Dai, HY ;
He, YDD ;
van't Veer, LJ ;
Bartelink, H ;
van de Rijn, M ;
Brown, PO ;
van de Vijver, MJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (10) :3738-3743
[4]   PREDICTIVE SAMPLE REUSE METHOD WITH APPLICATIONS [J].
GEISSER, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (350) :320-328
[5]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[6]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[7]   ESTIMATION OF ERROR RATES IN DISCRIMINANT ANALYSIS [J].
LACHENBR.PA ;
MICKEY, MR .
TECHNOMETRICS, 1968, 10 (01) :1-&
[8]  
McLachlan G, 2004, ANAL MICROARRAY GENE, DOI 10.1002/047172842X
[9]   Prediction of cancer outcome with microarrays: a multiple random validation strategy [J].
Michiels, S ;
Koscielny, S ;
Hill, C .
LANCET, 2005, 365 (9458) :488-492
[10]   Lessons from controversy: Ovarian cancer screening and serum proteomics [J].
Ransohoff, DF .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2005, 97 (04) :315-319