Estimating the number of clusters in a ranking data context

被引:3
|
作者
Calmon, Wilson [1 ]
Albi, Mariana [1 ]
机构
[1] Fluminense Fed Univ, Inst Math & Stat, BR-24210201 Niteroi, RJ, Brazil
关键词
Number of clusters; Ranking data; Plackett-Luce; Clustering; Ordinal classification;
D O I
10.1016/j.ins.2020.09.056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study introduces two methods for estimating the number of clusters specially designed to identify the number of groups in a finite population of objects or items ranked by several judges under the assumption that these judges belong to a homogeneous population. The proposed methods are both based on a hierarchical version of the classical Plackett-Luce model in which the number of clusters is set as an additional parameter. These methods do not require continuous score data to be available or restrict the number of clusters to be greater than one or less than the total number of objects, thereby enabling their application in a wide range of scenarios. The results of a large simulation study suggest that the proposed methods outperform well-established methodologies (Calinski & Harabasz, gap, Hartigan, Krzanowski & Lai, jump, and silhouette) as well as some recently proposed approaches (instability, quantization error modeling, slope, and utility). They realize the highest percentages of correct estimates of the number of clusters and the smallest errors compared with these well-established methodologies. We illustrate the proposed methods by analyzing a ranking dataset obtained from Formula One motor racing. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:977 / 995
页数:19
相关论文
共 50 条
  • [31] Effects of Resampling in Determining the Number of Clusters in a Data Set
    Rainer Dangl
    Friedrich Leisch
    Journal of Classification, 2020, 37 : 558 - 583
  • [32] Automatically Determining the Number of Clusters in Unlabeled Data Sets
    Wang, Liang
    Leckie, Christopher
    Ramamohanarao, Kotagiri
    Bezdek, James
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (03) : 335 - 350
  • [33] Effects of Resampling in Determining the Number of Clusters in a Data Set
    Dangl, Rainer
    Leisch, Friedrich
    JOURNAL OF CLASSIFICATION, 2020, 37 (03) : 558 - 583
  • [34] EVALUATION OF COEFFICIENTS FOR DETERMINING THE OPTIMAL NUMBER OF CLUSTERS IN CLUSTER ANALYSIS ON REAL DATA SETS
    Loster, Tomas
    9TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2015, : 1014 - 1023
  • [35] Automatic Determination of the Appropriate Number of Clusters for Multispectral Image Data
    Koonsanit, Kitti
    Jaruskulchai, Chuleerat
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1256 - 1263
  • [36] An examination of indexes for determining the number of clusters in binary data sets
    Evgenia Dimitriadou
    Sara Dolničar
    Andreas Weingessel
    Psychometrika, 2002, 67 : 137 - 159
  • [37] An evolutionary algorithm for clustering data streams with a variable number of clusters
    Silva, Jonathan de Andrade
    Hruschka, Eduardo Raul
    Gama, Joao
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 : 228 - 238
  • [38] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325
  • [39] A Support System for Clustering Data Streams with a Variable Number of Clusters
    Silva, Jonathan de Andrade
    Hruschka, Eduardo Raul
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2016, 11 (02)
  • [40] A cluster approach to analyze preference data: Choice of the number of clusters
    Sahmer, K
    Vigneau, E
    Qannari, EM
    FOOD QUALITY AND PREFERENCE, 2006, 17 (3-4) : 257 - 265