Optimized between-group classification:: a new jackknife-based gene selection procedure for genome-wide expression data -: art. no. 239

被引:8
作者
Baty, F [1 ]
Bihl, MP
Perrière, G
Culhane, AC
Brutsche, MH
机构
[1] Univ Basel Hosp, CH-4031 Basel, Switzerland
[2] Univ Lyon 1, Lab Biometrie & Biol Evolut, CNRS, UMR 5558, F-69622 Villeurbanne, France
[3] Univ Coll Dublin, Bioinformat Conway Inst, Dublin 2, Ireland
关键词
D O I
10.1186/1471-2105-6-239
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA. Results: We propose an optimized between-group classification (OBC) which uses a jackknife-based gene selection procedure. OBC emphasizes classification accuracy rather than feature selection. OBC is a backward optimization procedure that maximizes the percentage of between group inertia by removing the least influential genes one by one from the analysis. This selects a subset of highly discriminative genes which optimize disease class prediction. We apply OBC to four datasets and compared it to other classification methods. Conclusion: OBC considerably improved the classification and predictive accuracy of BGA, when assessed using independent data sets and leave-one-out cross-validation. Availability: The R code is freely available [see Additional file 1] as well as supplementary information [see Additional file 2].
引用
收藏
页码:1 / 12
页数:12
相关论文
共 32 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Optimization models for cancer classification: extracting gene interaction information from microarray expression data [J].
Antonov, AV ;
Tetko, IV ;
Mader, MT ;
Budczies, J ;
Mewes, HW .
BIOINFORMATICS, 2004, 20 (05) :644-U145
[3]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[4]  
Chessel D., 2004, R NEWS, V4, P6, DOI DOI 10.2307/3780087
[5]   MADE4:: an R package for multivariate analysis of gene expression data [J].
Culhane, AC ;
Thioulouse, J ;
Perrière, G ;
Higgins, DG .
BIOINFORMATICS, 2005, 21 (11) :2789-2790
[6]   Between-group analysis of microarray data [J].
Culhane, AC ;
Perrière, G ;
Considine, EC ;
Cotter, TG ;
Higgins, DG .
BIOINFORMATICS, 2002, 18 (12) :1600-1608
[7]  
DOLEDEC S, 1987, ACTA OECOL-OEC GEN, V8, P403
[8]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[9]   Outcome signature genes in breast cancer: is there a unique set? [J].
Ein-Dor, L ;
Kela, I ;
Getz, G ;
Givol, D ;
Domany, E .
BIOINFORMATICS, 2005, 21 (02) :171-178
[10]   Correspondence analysis applied to microarray data [J].
Fellenberg, K ;
Hauser, NC ;
Brors, B ;
Neutzner, A ;
Hoheisel, JD ;
Vingron, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (19) :10781-10786