LESS: A model-based classifier for sparse subspaces

被引:26
作者
Veenman, CJ [1 ]
Tax, DMJ [1 ]
机构
[1] Delft Univ Technol, Fac Elect Engn, Dept Mediamat, NL-2600 GA Delft, Netherlands
关键词
classification; support vector machine; high-dimensional; feature subset selection; mathematical programming;
D O I
10.1109/TPAMI.2005.182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we specifically focus on high-dimensional data sets for which the number of dimensions is an order of magnitude higher than the number of objects. From a classifier design standpoint, such small sample size problems have some interesting challenges. The first challenge is to find, from all hyperplanes that separate the classes, a separating hyperplane which generalizes well for future data. A second important task is to determine which features are required to distinguish the classes. To attack these problems, we propose the LESS ( Lowest Error in a Sparse Subspace) classifier that efficiently finds linear discriminants in a sparse subspace. In contrast with most classifiers for high-dimensional data sets, the LESS classifier incorporates a (simple) data model. Further, by means of a regularization parameter, the classifier establishes a suitable trade-off between subspace sparseness and classification accuracy. In the experiments, we show how LESS performs on several high-dimensional data sets and compare its performance to related state-of-the-art classifiers like, among others, linear ridge regression with the LASSO and the Support Vector Machine. It turns out that LESS performs competitively while using fewer dimensions.
引用
收藏
页码:1496 / 1500
页数:5
相关论文
共 23 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
  • [3] Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data
    Bhattacharyya, C
    Grate, LR
    Rizki, A
    Radisky, D
    Molina, FJ
    Jordan, MI
    Bissell, MJ
    Mian, IS
    [J]. SIGNAL PROCESSING, 2003, 83 (04) : 729 - 743
  • [4] Blake C.L., 1998, UCI repository of machine learning databases
  • [5] BRADLEY PS, 1998, P 15 INT C MACH LEAR, P82
  • [6] Breiman L., 1993, BETTER SUBSET SELECT
  • [7] A tutorial on Support Vector Machines for pattern recognition
    Burges, CJC
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
  • [8] Feature Subset Selection within a Simulated Annealing Data Mining Algorithm
    Debuse J.C.W.
    Rayward-Smith V.J.
    [J]. Journal of Intelligent Information Systems, 1997, 9 (1) : 57 - 81
  • [9] *FREE SOFTW FDN, 2005, GNLI LIN PROGR
  • [10] FUNG G, 2000, P 6 ACM SIGKDD INT C, V2094, P64