Distance-based clustering of mixed data

被引:47
作者
van de Velden, Michel [1 ]
D'Enza, Alfonso Iodice [2 ]
Markos, Angelos [3 ]
机构
[1] Erasmus Univ, Dept Econ, Rotterdam, Netherlands
[2] Univ Cassino & Southern Lazio, Dept Econ & Law, Cassino, FR, Italy
[3] Democritus Univ Thrace, Dept Primary Educ, Xanthi, Greece
关键词
cluster analysis; dimension reduction; distance based methods; joint dimension reduction and clustering; mixed data; MIXTURE MODEL; DISCRIMINANT-ANALYSIS; GENERAL COEFFICIENT; COMPONENT ANALYSIS; ALGORITHM; SIMILARITY; FACTORIAL;
D O I
10.1002/wics.1456
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cluster analysis comprises of several unsupervised techniques aiming to identify a subgroup (cluster) structure underlying the observations of a data set. The desired cluster allocation is such that it assigns similar observations to the same subgroup. Depending on the field of application and on domain-specific requirements, different approaches exist that tackle the clustering problem. In distance-based clustering, a distance metric is used to determine the similarity between data objects. The distance metric can be used to cluster observations by considering the distances between objects directly or by considering distances between objects and cluster centroids (or some other cluster representative points). Most distance metrics, and hence the distance-based clustering methods, work either with continuous-only or categorical-only data. In applications, however, observations are often described by a combination of both continuous and categorical variables. Such data sets can be referred to as mixed or mixed-type data. In this review, we consider different methods for distance-based cluster analysis of mixed data. In particular, we distinguish three different streams that range from basic data preprocessing (where all variables are converted to the same scale), to the use of specific distance measures for mixed data, and finally to so-called joint data reduction (a combination of dimension reduction and clustering) methods specifically designed for mixed data. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical and Graphical Methods of Data Analysis > Dimension Reduction
引用
收藏
页数:12
相关论文
共 50 条
  • [31] An incremental mixed data clustering method using a new distance measure
    Noorbehbahani, Fakhroddin
    Mousavi, Sayyed Rasoul
    Mirzaei, Abdolreza
    SOFT COMPUTING, 2015, 19 (03) : 731 - 743
  • [32] Clustering mixed-type data using a probabilistic distance algorithm
    Tortora, Cristina
    Palumbo, Francesco
    APPLIED SOFT COMPUTING, 2022, 130
  • [33] A Hybrid Distance-Based and Naive Bayes Online Classifier
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT II, 2015, 9330 : 213 - 222
  • [34] A Mixed Similarity Measure in Near-Linear Computational Complexity for Distance-Based Methods
    Nguyen, Ngoc Binh
    Ho, Tu Bao
    LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 211 - 220
  • [35] A New Adaptive Mixture Distance-Based Improved Density Peaks Clustering for Gearbox Fault Diagnosis
    Sharma, Krishna Kumar
    Seal, Ayan
    Yazidi, Anis
    Krejcar, Ondrej
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [36] A Family of the Online Distance-Based Classifiers
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, 2014, 8398 : 177 - 186
  • [37] Distance-based functions for image comparison
    Di Gesù, V
    Starovoitov, V
    PATTERN RECOGNITION LETTERS, 1999, 20 (02) : 207 - 214
  • [38] Adaptive Resonance Theory-based Clustering for Handling Mixed Data
    Masuyama, Naoki
    Nojima, Yusuke
    Ishibuchi, Hisao
    Liu, Zongying
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [39] Distance-based genome rearrangement phylogeny
    Wang, Li-San
    Warnow, Tandy
    Moret, Bernard M. E.
    Jansen, Robert K.
    Raubeson, Linda A.
    JOURNAL OF MOLECULAR EVOLUTION, 2006, 63 (04) : 473 - 483
  • [40] Single Landmark Distance-Based Navigation
    Nguyen, Thien-Minh
    Qiu, Zhirong
    Cao, Muqing
    Nguyen, Thien Hoang
    Xie, Lihua
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2020, 28 (05) : 2021 - 2028