Reduced k-means clustering with MCA in a low-dimensional space

被引:0
作者
Mitsuhiro, Masaki [1 ]
Yadohisa, Hiroshi [2 ]
机构
[1] Nikkei Res Inc, Chiyoda Ku, Tokyo 1010047, Japan
[2] Doshisha Univ, Dept Culture & Informat Sci, Kyotanabe, Kyoto 6100394, Japan
关键词
Categorical data; Simultaneous analysis; ALS; FACTORIAL; REDUCTION;
D O I
10.1007/s00180-014-0544-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the two-step sequential approach called tandem analysis, we focus on applying a clustering algorithm on estimated object scores after dimensional reduction of variables. In this approach, reduction may obscure or mask taxonomic information (Arabie and Hubert in Handbook of marketing research. Blackwell, Oxford, 1994). As an alternative to tandem analysis, an approach combining two methods for categorical data is proposed by Hwang et al. (Psychometrika 71: 161-171, 2006); however, this method does not consider the removal of object scores estimated as a vector of 1 that has no meaning in the first dimension. In this study, we propose a method for clustering objects consisting of categorical variables in a low-dimensional space. Our proposed method uses simultaneous analysis of multi-dimensional nonmetric principal component analysis and k-means clustering for categorical data; that is, we reduce dimensions with category quantifications, thus clustering object scores. We display object scores and variable categories, and therefore, every relationship between objects and categories can be interpreted for each cluster. Using simulated data, this method has been compared with tandem clustering and applied to real world data.
引用
收藏
页码:463 / 475
页数:13
相关论文
共 18 条
[1]  
Adachi K, 2011, NONMETRIC MULTIVARIA
[2]  
Arabie P., 1994, Advanced methods in marketing research, P160
[3]   On Joint Dimension Reduction and Clustering of Categorical Data [J].
D'Enza, Alfonso Iodice ;
Van de Velden, Michel ;
Palumbo, Francesco .
ANALYSIS AND MODELING OF COMPLEX DATA IN BEHAVIORAL AND SOCIAL SCIENCES, 2014, :161-169
[4]  
De Soete G., 1994, NEW APPROACHES CLASS, P212, DOI [DOI 10.1007/978-3-642-51175-2_24, 10.1007/978-3-642-51175-2_24, DOI 10.1007/978-3-642-51175-224]
[5]  
Gifi A., 1990, NONLINEAR MULTIVARIA
[6]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[7]  
Hwang H., 2010, Behaviormetrika, V37, P111, DOI [DOI 10.2333/BHMK.37.111, 10.2333/bhmk.37.111]
[8]   Simultaneous Two-Way Clustering of Multiple Correspondence Analysis [J].
Hwang, Heungsun ;
Dillon, William R. .
MULTIVARIATE BEHAVIORAL RESEARCH, 2010, 45 (01) :186-208
[9]   An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents [J].
Hwang, HS ;
Dillon, WR ;
Takane, Y .
PSYCHOMETRIKA, 2006, 71 (01) :161-171
[10]  
Iodice D' Enza A, 2013, COMPUT STAT, V28, P1