Finding optimal classifiers for small feature sets in genomics and proteomics

被引:5
作者
Stiglic, Gregor [1 ]
Rodriguez, Juan J. [2 ]
Kokol, Peter [1 ,3 ]
机构
[1] Univ Maribor, Fac Hlth Sci, SLO-2000 Maribor, Slovenia
[2] Univ Burgos, Burgos 09006, Spain
[3] Univ Maribor, Fac Elect Engn & Comp Sci, SLO-2000 Maribor, Slovenia
关键词
Gene expression analysis; Machine learning; Feature selection; Rotation Forest;
D O I
10.1016/j.neucom.2010.02.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The classification of genomic and proteomic data in extremely high dimensional datasets is a well-known problem which requires appropriate classification techniques. Classification methods are usually combined with gene selection techniques to provide optimal classification conditions i.e. a lower dimensional classification environment. Another reason for reducing the dimensionality of such datasets is their interpretability, as it is much easier to interpret a small set of ranked genes than 20 thousand genes. This paper evaluates the classification performance of Rotation Forest classifier on small subsets of ranked genes for two dataset collections consisting of 47 genomic and proteomic classification problems. Robustness and high classification accuracy is shown to be an important feature of Rotation Forest when applied to small sets of genes. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:2346 / 2352
页数:7
相关论文
共 17 条
  • [1] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [2] [Anonymous], KENT RIDGE BIOMEDICA
  • [3] A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes
    Baldi, P
    Long, AD
    [J]. BIOINFORMATICS, 2001, 17 (06) : 509 - 519
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Caruana R, 2006, ICML 06: proceedings of the 23rd International Conference on Machine Learning, P161, DOI [DOI 10.1145/1143844.1143865, 10.1145/1143844.1143865.]
  • [6] Dietterich, 2002, HDB BRAIN THEORY NEU, P405, DOI DOI 10.1007/978-1-4419-9326-7_1
  • [7] Greenacre M, 1983, THEORY APPL CORRES A
  • [8] Improvements to Platt's SMO algorithm for SVM classifier design
    Keerthi, SS
    Shevade, SK
    Bhattacharyya, C
    Murthy, KRK
    [J]. NEURAL COMPUTATION, 2001, 13 (03) : 637 - 649
  • [9] Cancer classification using Rotation Forest
    Liu, Kun-Hong
    Huang, De-Shuang
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2008, 38 (05) : 601 - 610
  • [10] MITCHELL T, 1989, ANNU REV COMPUT SCI, V4, P417