Asymmetric linear dimension reduction for classification

被引:28
作者
Hennig, C [1 ]
机构
[1] Univ Hamburg, Fachbereich Math, D-20146 Hamburg, Germany
关键词
canonical coordinates; cluster validation; discriminant coordinates; MCD estimator; nearest neighbor; projection pursuit; quasars; visualization;
D O I
10.1198/106186004X12740
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article discusses methods to project a p-dimensional dataset with classified points from s known classes onto a lower dimensional hyperplane so that the classes appear optimally separated. Such projections can be used, for example, for data visualization and classification in lower dimensions. New methods, which are asymmetric with respect to the numbering of the groups, are introduced for s = 2. They aim at generating data projections where one class is homogeneous and optimally separated from the other class, while the other class may be widespread. They are compared to classical discriminant coordinates and other symmetric methods from the literature by a simulation study, the application to a 12-dimensional dataset of 74,159 spectra of stellar objects, and to land snails distribution data. Neighborhood-based methods are also investigated, where local information about the separation of the classes is averaged. The use of robust MCD-covariance matrices is suggested.
引用
收藏
页码:930 / 945
页数:16
相关论文
共 34 条
[11]  
Gnanadesikan R., 1977, Methods for statistical data analysis of multivariate observations, V2
[12]   Discriminant adaptive nearest neighbor classification [J].
Hastie, T ;
Tibshirani, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (06) :607-616
[13]   High-breakdown linear discriminant analysis [J].
Hawkins, DM ;
McLachlan, GJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (437) :136-143
[14]   Distance-based parametric bootstrap tests for clustering of species ranges [J].
Hennig, C ;
Hausdorf, E .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 45 (04) :875-895
[15]   Validating visual clusters in large datasets: fixed point clusters of spectral features [J].
Hennig, C ;
Christlieb, N .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 40 (04) :723-739
[16]   PROJECTION PURSUIT [J].
HUBER, PJ .
ANNALS OF STATISTICS, 1985, 13 (02) :435-475
[17]   ANALYZING HIGH-DIMENSIONAL DATA WITH MOTION GRAPHICS [J].
HURLEY, C ;
BUJA, A .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1990, 11 (06) :1193-1211
[18]  
Kiers HAL, 2000, ST CLASS DAT ANAL, P207
[19]   ORTHOGONAL CANONICAL VARIATES FOR DISCRIMINATION AND CLASSIFICATION [J].
KRZANOWSKI, WJ .
JOURNAL OF CHEMOMETRICS, 1995, 9 (06) :509-520
[20]  
Pires AM, 2003, DEVELOPMENTS IN ROBUST STATISTICS, P317