Optimality Driven Nearest Centroid Classification from Genomic Data

被引:26
作者
Dabney, Alan R. [1 ]
Storey, John D. [2 ,3 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
来源
PLOS ONE | 2007年 / 2卷 / 10期
关键词
D O I
10.1371/journal.pone.0001002
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.
引用
收藏
页数:7
相关论文
共 21 条
  • [11] Regularized linear discriminant analysis and its application in microarrays
    Guo, Yaqian
    Hastie, Trevor
    Tibshirani, Robert
    [J]. BIOSTATISTICS, 2007, 8 (01) : 86 - 100
  • [12] Gene-expression profiles in hereditary breast cancer.
    Hedenfalk, I
    Duggan, D
    Chen, YD
    Radmacher, M
    Bittner, M
    Simon, R
    Meltzer, P
    Gusterson, B
    Esteller, M
    Kallioniemi, OP
    Wilfond, B
    Borg, Å
    Trent, J
    Raffeld, M
    Yakhini, Z
    Ben-Dor, A
    Dougherty, E
    Kononen, J
    Bubendorf, L
    Fehrle, W
    Pittaluga, S
    Gruvberger, S
    Loman, N
    Johannsoson, O
    Olsson, H
    Sauter, G
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2001, 344 (08) : 539 - 548
  • [13] Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks
    Khan, J
    Wei, JS
    Ringnér, M
    Saal, LH
    Ladanyi, M
    Westermann, F
    Berthold, F
    Schwab, M
    Antonescu, CR
    Peterson, C
    Meltzer, PS
    [J]. NATURE MEDICINE, 2001, 7 (06) : 673 - 679
  • [14] An extensive comparison of recent classification tools applied to microarray data
    Lee, JW
    Lee, JB
    Park, M
    Song, SH
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (04) : 869 - 885
  • [15] VARIABLE SELECTION TECHNIQUES IN DISCRIMINANT-ANALYSIS .1. DESCRIPTION
    MCKAY, RJ
    CAMPBELL, NA
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1982, 35 (MAY) : 1 - 29
  • [16] VARIABLE SELECTION TECHNIQUES IN DISCRIMINANT-ANALYSIS .2. ALLOCATION
    MCKAY, RJ
    CAMPBELL, NA
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1982, 35 (MAY) : 30 - 41
  • [17] Systematic variation in gene expression patterns in human cancer cell lines
    Ross, DT
    Scherf, U
    Eisen, MB
    Perou, CM
    Rees, C
    Spellman, P
    Iyer, V
    Jeffrey, SS
    Van de Rijn, M
    Waltham, M
    Pergamenschikov, A
    Lee, JCE
    Lashkari, D
    Shalon, D
    Myers, TG
    Weinstein, JN
    Botstein, D
    Brown, PO
    [J]. NATURE GENETICS, 2000, 24 (03) : 227 - 235
  • [18] A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics
    Schäfer, J
    Strimmer, K
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2005, 4 : 1 - 30
  • [19] Eigengene-based linear discriminant model for tumor classification using gene expression microarray data
    Shen, Ronglai
    Ghosh, Debashis
    Chinnaiyan, Arul
    Meng, Zhaoling
    [J]. BIOINFORMATICS, 2006, 22 (21) : 2635 - 2642
  • [20] Stein C., 1956, Proceedings of the Third Berkeley symposium on mathematical statistics and probability, V1, P197, DOI DOI 10.1525/9780520313880-018