Feature selection;
linear discriminant analysis;
correlation;
James-Stein estimator;
small n;
large p" setting;
correlation-adjusted t-score;
false discovery rates;
higher criticism;
LINEAR DISCRIMINANT-ANALYSIS;
SHRUNKEN CENTROIDS;
CLASSIFICATION;
REGRESSION;
DISCOVERY;
RANKING;
BAYES;
D O I:
10.1214/09-AOAS277
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted t-scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James-Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package "sda" available from the R repository CRAN.
机构:
Stanford Univ, Dept Stat, Stanford, CA 94305 USAStanford Univ, Dept Stat, Stanford, CA 94305 USA
Donoho, David
Jin, Jiashun
论文数: 0引用数: 0
h-index: 0
机构:
Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USAStanford Univ, Dept Stat, Stanford, CA 94305 USA
机构:
Stanford Univ, Dept Stat, Stanford, CA 94305 USAStanford Univ, Dept Stat, Stanford, CA 94305 USA
Donoho, David
Jin, Jiashun
论文数: 0引用数: 0
h-index: 0
机构:
Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USAStanford Univ, Dept Stat, Stanford, CA 94305 USA