Simple and flexible classification of gene expression microarrays via Swirls and Ripples

被引:15
作者
Baker, Stuart G. [1 ]
机构
[1] NCI, Biometry Res Grp, Canc Prevent Div, Bethesda, MD 20892 USA
关键词
RELATIVE UTILITY CURVES; DISCRIMINANT-ANALYSIS; RISK PREDICTION; CANCER; TUMOR; CARCINOGENESIS; PERSPECTIVE; DISCOVERY; SELECTION; CELLS;
D O I
10.1186/1471-2105-11-452
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A simple classification rule with few genes and parameters is desirable when applying a classification rule to new data. One popular simple classification rule, diagonal discriminant analysis, yields linear or curved classification boundaries, called Ripples, that are optimal when gene expression levels are normally distributed with the appropriate variance, but may yield poor classification in other situations. Results: A simple modification of diagonal discriminant analysis yields smooth highly nonlinear classification boundaries, called Swirls, that sometimes outperforms Ripples. In particular, if the data are normally distributed with different variances in each class, Swirls substantially outperforms Ripples when using a pooled variance to reduce the number of parameters. The proposed classification rule for two classes selects either Swirls or Ripples after parsimoniously selecting the number of genes and distance measures. Applications to five cancer microarray data sets identified predictive genes related to the tissue organization theory of carcinogenesis. Conclusion: The parsimonious selection of classifiers coupled with the selection of either Swirls or Ripples provides a good basis for formulating a simple, yet flexible, classification rule. Open source software is available for download.
引用
收藏
页数:9
相关论文
共 30 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], 2008, MATH VERS 7 0
[3]  
[Anonymous], 2003, STAT EVALUATION MEDI
[4]  
BAKER SG, J CLIN ONCOLOGY
[5]   Paradoxes in carcinogenesis: New opportunities for research directions [J].
Baker, Stuart G. ;
Kramer, Barnett S. .
BMC CANCER, 2007, 7 (1)
[6]   Identifying genes that contribute most to good classification in microarrays [J].
Baker, Stuart G. ;
Kramer, Barnett S. .
BMC BIOINFORMATICS, 2006, 7 (1)
[7]   Putting Risk Prediction in Perspective: Relative Utility Curves [J].
Baker, Stuart G. .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2009, 101 (22) :1538-1542
[8]   Using relative utility curves to evaluate risk prediction [J].
Baker, Stuart G. ;
Cook, Nancy R. ;
Vickers, Andrew ;
Kramer, Barnett S. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2009, 172 :729-748
[9]   Plausibility of stromal initiation of epithelial cancers without a mutation in the epithelium: a computer simulation of morphostats [J].
Baker, Stuart G. ;
Soto, Ana M. ;
Sonnenschein, Carlos ;
Cappuccio, Antonio ;
Potter, John D. ;
Kramer, Barnett S. .
BMC CANCER, 2009, 9
[10]   Classification of microarrays to nearest centroids [J].
Dabney, AR .
BIOINFORMATICS, 2005, 21 (22) :4148-4154