A New Dimension Reduction Method: Factor Discriminant K-means

被引:17
作者
Rocci, Roberto [1 ]
Gattone, Stefano Antonio [1 ]
Vichi, Maurizio [2 ]
机构
[1] Univ Roma Tor Vergata, Dept SEFeMeQ, Rome, Italy
[2] Univ Roma La Sapienza, Dept Stat Probabil & Appl Stat, Rome, Italy
关键词
Cluster analysis; Dimension reduction; K-Means; Principal Component Analysis;
D O I
10.1007/s00357-011-9085-9
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Reduced K-means (RKM) and Factorial K-means (FKM) are two data reduction techniques incorporating principal component analysis and K-means into a unified methodology to obtain a reduced set of components for variables and an optimal partition for objects. RKM finds clusters in a reduced space by maximizing the between-clusters deviance without imposing any condition on the within-clusters deviance, so that clusters are isolated but they might be heterogeneous. On the other hand, FKM identifies clusters in a reduced space by minimizing the within-clusters deviance without imposing any condition on the between-clusters deviance. Thus, clusters are homogeneous, but they might not be isolated. The two techniques give different results because the total deviance in the reduced space for the two methodologies is not constant; hence the minimization of the within-clusters deviance is not equivalent to the maximization of the between-clusters deviance. In this paper a modification of the two techniques is introduced to avoid the afore mentioned weaknesses. It is shown that the two modified methods give the same results, thus merging RKM and FKM into a new methodology. It is called Factor Discriminant K-means (FDKM), because it combines Linear Discriminant Analysis and K-means. The paper examines several theoretical properties of FDKM and its performances with a simulation study. An application on real-world data is presented to show the features of FDKM.
引用
收藏
页码:210 / 226
页数:17
相关论文
共 11 条
[1]  
[Anonymous], 1979, OPTIMISATION CLASSIF
[2]  
Bock H.-H., 1987, MULTIVARIATE STAT MO, P17
[3]  
Calinski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[4]   REVIEW OF CLASSIFICATION [J].
CORMACK, RM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-GENERAL, 1971, 134 :321-+
[5]  
De Soete G., 1994, New approaches in classification and data analysis, P212, DOI DOI 10.1007/978-3-642-51175-2_24
[6]  
Forina M., 1988, Parvus-An Extendible Package for Data Exploration, Classification and Correlation
[7]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[8]   Factorial and reduced K-means reconsidered [J].
Timmerman, Marieke E. ;
Ceulemans, Eva ;
Kiers, Henk A. L. ;
Vichi, Maurizio .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (07) :1858-1871
[9]   CLUSTERING N-OBJECTS INTO K-GROUPS UNDER OPTIMAL-SCALING OF VARIABLES [J].
VANBUUREN, S ;
HEISER, WJ .
PSYCHOMETRIKA, 1989, 54 (04) :699-706
[10]   Factorial k-means analysis for two-way data [J].
Vichi, M ;
Kiers, HAL .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2001, 37 (01) :49-64