Sparse Discriminant Analysis

被引:385
作者
Clemmensen, Line [1 ]
Hastie, Trevor [2 ]
Witten, Daniela [3 ]
Ersboll, Bjarne [1 ]
机构
[1] Tech Univ Denmark, Dept Informat & Math Modelling, DK-2800 Lyngby, Denmark
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Classification; Dimension reduction; Feature selection; Linear discriminant analysis; Mixture discriminant analysis; PARTIAL LEAST-SQUARES; REGRESSION; CLASSIFICATION; PREDICTION; DIAGNOSIS; SELECTION;
D O I
10.1198/TECH.2011.08118
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of performing interpretable classification in the high-dimensional setting, in which the number of features is very large and the number of observations is limited. This setting has been studied extensively in the chemometrics literature, and more recently has become commonplace in biological and medical applications. In this setting, a traditional approach involves performing feature selection before classification. We propose sparse discriminant analysis, a method for performing linear discriminant analysis with a sparseness criterion imposed such that classification and feature selection are performed simultaneously. Sparse discriminant analysis is based on the optimal scoring interpretation of linear discriminant analysis, and can be extended to perform sparse discrimination via mixtures of Gaussians if boundaries between classes are nonlinear or if subgroups are present within each class. Our proposal also provides low-dimensional views of the discriminative directions.
引用
收藏
页码:406 / 413
页数:8
相关论文
共 28 条