Visualization and clustering of categorical data with probabilistic self-organizing map

被引:4
作者
Lebbah, Mustapha [2 ]
Benabdeslem, Khalid [1 ]
机构
[1] Univ Lyon 1, EA4125, LIESP, F-69622 Lyon, France
[2] Univ Paris 13, LIPN, UMR 7030, CNRS, F-93430 Villetaneuse, France
关键词
Probabilistic self-organizing map; Categorical variables; Visualization; EM algorithm; MODEL; BERNOULLI;
D O I
10.1007/s00521-009-0299-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.
引用
收藏
页码:393 / 404
页数:12
相关论文
共 37 条
[1]   Bi-level clustering of mixed categorical and numerical biomedical data [J].
Andreopoulos, Bill ;
An, Aijun ;
Wang, Xiaogang .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2006, 1 (01) :19-56
[2]  
[Anonymous], ESANN
[3]  
[Anonymous], 1991, ARTIFICIAL NEURAL NE
[4]  
[Anonymous], 2007, Uci machine learning repository
[5]   Probabilistic self-organizing map and radial basis function networks [J].
Anouar, F ;
Badran, F ;
Thiria, S .
NEUROCOMPUTING, 1998, 20 (1-3) :83-96
[6]   GTM: The generative topographic mapping [J].
Bishop, CM ;
Svensen, M ;
Williams, CKI .
NEURAL COMPUTATION, 1998, 10 (01) :215-234
[7]   A hierarchical latent variable model for data visualization [J].
Bishop, CM ;
Tipping, ME .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :281-293
[8]   EM procedures using mean field-like approximations for Markov model-based image segmentation [J].
Celeux, G ;
Forbes, F ;
Peyrard, N .
PATTERN RECOGNITION, 2003, 36 (01) :131-144
[9]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38