Asymmetric linear dimension reduction for classification

被引:28
作者
Hennig, C [1 ]
机构
[1] Univ Hamburg, Fachbereich Math, D-20146 Hamburg, Germany
关键词
canonical coordinates; cluster validation; discriminant coordinates; MCD estimator; nearest neighbor; projection pursuit; quasars; visualization;
D O I
10.1198/106186004X12740
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article discusses methods to project a p-dimensional dataset with classified points from s known classes onto a lower dimensional hyperplane so that the classes appear optimally separated. Such projections can be used, for example, for data visualization and classification in lower dimensions. New methods, which are asymmetric with respect to the numbering of the groups, are introduced for s = 2. They aim at generating data projections where one class is homogeneous and optimally separated from the other class, while the other class may be widespread. They are compared to classical discriminant coordinates and other symmetric methods from the literature by a simulation study, the application to a 12-dimensional dataset of 74,159 spectra of stellar objects, and to land snails distribution data. Neighborhood-based methods are also investigated, where local information about the separation of the classes is averaged. The use of robust MCD-covariance matrices is suggested.
引用
收藏
页码:930 / 945
页数:16
相关论文
共 34 条
[1]   THE GRAND TOUR - A TOOL FOR VIEWING MULTIDIMENSIONAL DATA [J].
ASIMOV, D .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1985, 6 (01) :128-143
[2]  
Buja Andreas, 1996, Journal of Computational and Graphical Statistics, V5, P78, DOI [DOI 10.1080/10618600.1996.10474696, 10.2307/1390754, DOI 10.2307/1390754]
[3]  
CARR DB, 1996, 129 G MASON U CTR CO
[4]   The stellar content of the Hamburg/ESO survey I. Automated selection of DA white dwarfs [J].
Christlieb, N ;
Wisotzki, L ;
Reimers, D ;
Homeier, D ;
Koester, D ;
Heber, U .
ASTRONOMY & ASTROPHYSICS, 2001, 366 (03) :898-912
[5]  
Cleveland W. S., 1988, DYNAMIC GRAPHICS STA
[6]  
Cook D., 1995, J COMPUTATIONAL GRAP, V4, P155, DOI DOI 10.1080/10618600.1995.10474674
[7]  
FEIGELSON ED, 2003, P SCMA, V3
[8]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588
[9]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[10]  
Fukunaga K., 1990, INTRO STAT PATTERN R