Mixing numerical and categorical data in a Self-Organizing Map by means of frequency neurons

被引:13
作者
del Coso, Carmelo [1 ]
Fustes, Diego [1 ]
Dafonte, Carlos [1 ]
Novoa, Francisco J. [1 ]
Rodriguez-Pedreira, Jose M. [1 ]
Arcay, Bernardino [1 ]
机构
[1] Univ A Coruna, Fac Informat, La Coruna 15071, Spain
关键词
Self-Organizing Map; Categorical data; Mixed data; Big data;
D O I
10.1016/j.asoc.2015.06.058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Even though Self-Organizing Maps (SOMs) constitute a powerful and essential tool for pattern recognition and data mining, the common SOM algorithm is not apt for processing categorical data, which is present in many real datasets. It is for this reason that the categorical values are commonly converted into a binary code, a solution that unfortunately distorts the network training and the posterior analysis. The present work proposes a SOM architecture that directly processes the categorical values, without the need of any previous transformation. This architecture is also capable of properly mixing numerical and categorical data, in such a manner that all the features adopt the same weight. The proposed implementation is scalable and the corresponding learning algorithm is described in detail. Finally, we demonstrate the effectiveness of the presented algorithm by applying it to several well-known datasets. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:246 / 254
页数:9
相关论文
共 17 条
[1]  
Agresti A., 2007, INTRO CATEGORICAL DA, V2nd, DOI DOI 10.1002/0470114754
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics, DOI DOI 10.1016/B978-0-12-057650-0.50013-2
[3]  
[Anonymous], 2004, OSDI
[4]  
[Anonymous], 1973, ANAL DONNEES ANAL CO
[5]  
Asuncion D.N. A., 2007, UCI MACHINE LEARNING
[6]  
Chen N, 2005, LECT NOTES ARTIF INT, V3808, P304, DOI 10.1007/11595014_31
[7]   How to use the Kohonen algorithm to simultaneously analyze individuals and modalities in a survey [J].
Cottrell, M ;
Letrémy, P .
NEUROCOMPUTING, 2005, 63 :193-207
[8]   SOM-based algorithms for qualitative variables [J].
Cottrell, M ;
Ibbou, S ;
Letrémy, P .
NEURAL NETWORKS, 2004, 17 (8-9) :1149-1167
[9]  
Fort J.-C., 2002, ESANN, P223
[10]   DIVISIVE CLUSTERING OF SYMBOLIC OBJECTS USING THE CONCEPTS OF BOTH SIMILARITY AND DISSIMILARITY [J].
GOWDA, KC ;
RAVI, TV .
PATTERN RECOGNITION, 1995, 28 (08) :1277-1282