Distance Metrics and Clustering Methods for Mixed-type Data

被引:43
|
作者
Foss, Alexander H. [1 ]
Markatou, Marianthi [1 ]
Ray, Bonnie [2 ]
机构
[1] Univ Buffalo, Dept Biostat, 706 Kimball Tower, Buffalo, NY 14214 USA
[2] Arenadotio, New York, NY USA
关键词
Discretisation; dummy coding; Gower's distance; k-means clustering; machine learning; Mahalanobis distance; mixture model; multivariate data analysis; unsupervised learning; MIXTURE MODEL; DISCRIMINANT-ANALYSIS; MAHALANOBIS DISTANCE; CATEGORICAL VARIABLES; MAXIMUM-LIKELIHOOD; FINITE MIXTURES; LOCATION MODEL; INFORMATION; DENSITY; SELECTION;
D O I
10.1111/insr.12274
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem. In order to identify the most effective approaches for clustering mixed-type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Guidelines on approaches to use under different scenarios are provided, along with potential directions for future research.
引用
收藏
页码:80 / 109
页数:30
相关论文
共 50 条
  • [1] Clustering mixed-type data using a probabilistic distance algorithm
    Tortora, Cristina
    Palumbo, Francesco
    APPLIED SOFT COMPUTING, 2022, 130
  • [2] Spectral Clustering of Mixed-Type Data
    Mbuga, Felix
    Tortora, Cristina
    STATS, 2022, 5 (01): : 1 - 11
  • [3] A generalized multi-aspect distance metric for mixed-type data clustering
    Mousavi, Elahe
    Sehhati, Mohammadreza
    PATTERN RECOGNITION, 2023, 138
  • [4] Benchmarking distance-based partitioning methods for mixed-type data
    Efthymios Costa
    Ioanna Papatsouma
    Angelos Markos
    Advances in Data Analysis and Classification, 2023, 17 : 701 - 724
  • [5] Benchmarking distance-based partitioning methods for mixed-type data
    Costa, Efthymios
    Papatsouma, Ioanna
    Markos, Angelos
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (03) : 701 - 724
  • [6] Clustering of samples and variables with mixed-type data
    Hummel, Manuela
    Edelmann, Dominic
    Kopp-Schneider, Annette
    PLOS ONE, 2017, 12 (11):
  • [7] Genetic algorithm for clustering mixed-type data
    Yang, Shiueng-Bien
    Wu, Yung-Gi
    JOURNAL OF ELECTRONIC IMAGING, 2011, 20 (01)
  • [8] kamila: Clustering Mixed-Type Data in R and Hadoop
    Foss, Alexander H.
    Markatou, Marianthi
    JOURNAL OF STATISTICAL SOFTWARE, 2018, 83 (13): : 1 - 44
  • [9] Clustering Approaches for Mixed-Type Data: A Comparative Study
    Ghattas, Badih
    San-Benito, Alvaro Sanchez
    JOURNAL OF PROBABILITY AND STATISTICS, 2025, 2025 (01)
  • [10] The quick dynamic clustering method for mixed-type data
    Ayuyev, V. V.
    Thura, A.
    Hlaing, N. N.
    Loginova, M. B.
    AUTOMATION AND REMOTE CONTROL, 2012, 73 (12) : 2083 - 2088