Distance-based clustering of mixed data

被引：47

作者：

van de Velden, Michel ^{[1
]}

D'Enza, Alfonso Iodice ^{[2
]}

Markos, Angelos ^{[3
]}

机构：

[1] Erasmus Univ, Dept Econ, Rotterdam, Netherlands

[2] Univ Cassino & Southern Lazio, Dept Econ & Law, Cassino, FR, Italy

[3] Democritus Univ Thrace, Dept Primary Educ, Xanthi, Greece

来源：

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS | 2019年 / 11卷 / 03期

关键词：

cluster analysis; dimension reduction; distance based methods; joint dimension reduction and clustering; mixed data; MIXTURE MODEL; DISCRIMINANT-ANALYSIS; GENERAL COEFFICIENT; COMPONENT ANALYSIS; ALGORITHM; SIMILARITY; FACTORIAL;

D O I：

10.1002/wics.1456

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cluster analysis comprises of several unsupervised techniques aiming to identify a subgroup (cluster) structure underlying the observations of a data set. The desired cluster allocation is such that it assigns similar observations to the same subgroup. Depending on the field of application and on domain-specific requirements, different approaches exist that tackle the clustering problem. In distance-based clustering, a distance metric is used to determine the similarity between data objects. The distance metric can be used to cluster observations by considering the distances between objects directly or by considering distances between objects and cluster centroids (or some other cluster representative points). Most distance metrics, and hence the distance-based clustering methods, work either with continuous-only or categorical-only data. In applications, however, observations are often described by a combination of both continuous and categorical variables. Such data sets can be referred to as mixed or mixed-type data. In this review, we consider different methods for distance-based cluster analysis of mixed data. In particular, we distinguish three different streams that range from basic data preprocessing (where all variables are converted to the same scale), to the use of specific distance measures for mixed data, and finally to so-called joint data reduction (a combination of dimension reduction and clustering) methods specifically designed for mixed data. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical and Graphical Methods of Data Analysis > Dimension Reduction

引用

页数：12

共 50 条

[41] DDC: distance-based decision classifier
Hamidzadeh, Javad
Monsefi, Reza
Yazdi, Hadi Sadoghi
NEURAL COMPUTING & APPLICATIONS, 2012, 21 (07) : 1697 - 1707
[42] Distance-Based Genome Rearrangement Phylogeny
Li-San Wang
Tandy Warnow
Bernard M. E. Moret
Robert K. Jansen
Linda A. Raubeson
Journal of Molecular Evolution, 2006, 63 : 473 - 483
[43] Clustering mixed type data: a space structure-based approach
Li, Feijiang
Qian, Yuhua
Wang, Jieting
Peng, Furong
Liang, Jiye
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (09) : 2799 - 2812
[44] Clustering Mixed Data Based on Density Peaks and Stacked Denoising Autoencoders
Duan, Baobin
Han, Lixin
Gou, Zhinan
Yang, Yi
Chen, Shuangshuang
SYMMETRY-BASEL, 2019, 11 (02):
[45] Hierarchical density-based clustering methods for tolling zone definition and their impact on distance-based toll optimization
Lentzakis, Antonis F.
Seshadri, Ravi
Akkinepally, Arun
Vu, Vinh-An
Ben-Akiva, Moshe
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 118
[46] A Distance-Based Boolean Applicability Domain for Classification of High Throughput Screening Data
Berenger, Francois
Yamanishi, Yoshihiro
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (01) : 463 - 476
[47] An ensemble of the distance-based and Naive Bayes classifiers for the online classification with data reduction
Jedrzejowicz, Joanna
Jedrzejowicz, Piotr
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (02) : 1289 - 1296
[48] Frechet distance-based cluster analysis for multi-dimensional functional data
Kang, Ilsuk
Choi, Hosik
Yoon, Young Joo
Park, Junyoung
Kwon, Soon-Sun
Park, Cheolwoo
STATISTICS AND COMPUTING, 2023, 33 (04)
[49] Affinity Learning for Mixed Data Clustering
Li, Nan
Latecki, Longin Jan
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2173 - 2179
[50] Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional Data
Thrun, Michael C.
Ultsch, Alfred
JOURNAL OF CLASSIFICATION, 2021, 38 (02) : 280 - 312

← 1 2 3 4 5 →