A PROBABILISTIC SELF-ORGANIZING MAP FOR BINARY DATA TOPOGRAPHIC CLUSTERING

被引:8
作者
Lebbah, Mustapha [1 ]
Bennani, Younes [1 ]
Rogovschi, Nicoleta [1 ]
机构
[1] Univ Paris 13, CNRS, LIPN, UMR 7030, F-93430 Villetaneuse, France
关键词
Self-organizing map; Bernoulli distribution; binary data;
D O I
10.1142/S1469026808002351
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.
引用
收藏
页码:363 / 383
页数:21
相关论文
共 42 条
[1]  
Andreopoulos B, 2007, LECT NOTES COMPUT SC, V4426, P11
[2]   Bi-level clustering of mixed categorical and numerical biomedical data [J].
Andreopoulos, Bill ;
An, Aijun ;
Wang, Xiaogang .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2006, 1 (01) :19-56
[3]  
Anouar F, 1997, P WSOM 97 WORKSH SEL, P339
[4]   Classification of run-length encoded binary data [J].
Babu, T. Ravindra ;
Murty, M. Narasimha ;
Agrawal, V. K. .
PATTERN RECOGNITION, 2007, 40 (01) :321-323
[5]   GTM: The generative topographic mapping [J].
Bishop, CM ;
Svensen, M ;
Williams, CKI .
NEURAL COMPUTATION, 1998, 10 (01) :215-234
[6]  
Blake C L, 1998, TECHNICAL REPORT
[7]   EM procedures using mean field-like approximations for Markov model-based image segmentation [J].
Celeux, G ;
Forbes, F ;
Peyrard, N .
PATTERN RECOGNITION, 2003, 36 (01) :131-144
[8]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Dolnicar S., 1998, 7 SFB AD INF SYST MO