Estimating the number of clusters in a ranking data context

被引:3
|
作者
Calmon, Wilson [1 ]
Albi, Mariana [1 ]
机构
[1] Fluminense Fed Univ, Inst Math & Stat, BR-24210201 Niteroi, RJ, Brazil
关键词
Number of clusters; Ranking data; Plackett-Luce; Clustering; Ordinal classification;
D O I
10.1016/j.ins.2020.09.056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study introduces two methods for estimating the number of clusters specially designed to identify the number of groups in a finite population of objects or items ranked by several judges under the assumption that these judges belong to a homogeneous population. The proposed methods are both based on a hierarchical version of the classical Plackett-Luce model in which the number of clusters is set as an additional parameter. These methods do not require continuous score data to be available or restrict the number of clusters to be greater than one or less than the total number of objects, thereby enabling their application in a wide range of scenarios. The results of a large simulation study suggest that the proposed methods outperform well-established methodologies (Calinski & Harabasz, gap, Hartigan, Krzanowski & Lai, jump, and silhouette) as well as some recently proposed approaches (instability, quantization error modeling, slope, and utility). They realize the highest percentages of correct estimates of the number of clusters and the smallest errors compared with these well-established methodologies. We illustrate the proposed methods by analyzing a ranking dataset obtained from Formula One motor racing. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:977 / 995
页数:19
相关论文
共 50 条
  • [41] An examination of indexes for determining the number of clusters in binary data sets
    Dimitriadou, E
    Dolnicar, S
    Weingessel, A
    PSYCHOMETRIKA, 2002, 67 (01) : 137 - 159
  • [42] On the number of clusters
    Hardy, A
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 23 (01) : 83 - 96
  • [43] Estimation of the number of clusters and influence zones
    Herbin, M
    Bonnet, N
    Vautrot, P
    PATTERN RECOGNITION LETTERS, 2001, 22 (14) : 1557 - 1568
  • [44] Unsupervised Ranking and Characterization of Differentiated Clusters
    Cazzanti, Luca
    Mehanian, Courosh
    Penzotti, Julie
    Scott, Doug
    Downs, Oliver
    2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: BIG DATA, EMERGENT THREATS, AND DECISION-MAKING IN SECURITY INFORMATICS, 2013, : 266 - 266
  • [45] Automatic recovering the number k of clusters in the data by active query selection
    Sousa, Herio
    de Souto, Marcilio C. P.
    Kuroshu, Reginaldo M.
    Lorena, Ana Carolina
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 1021 - 1029
  • [46] DETERMINING THE OPTIMAL NUMBER OF CLUSTERS IN CLUSTER ANALYSIS
    Loster, Tomas
    10TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2016, : 1078 - 1090
  • [47] Can the Number of Clusters Be Determined by External Indices?
    Rezaei, Mohammad
    Franti, Pasi
    IEEE ACCESS, 2020, 8 : 89239 - 89257
  • [48] On finding the number of clusters
    Kothari, R
    Pitts, D
    PATTERN RECOGNITION LETTERS, 1999, 20 (04) : 405 - 416
  • [49] Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering
    Arima, Chinatsu
    Hakamada, Kazumi
    Okamoto, Masahiro
    Hanai, Taizo
    JOURNAL OF BIOSCIENCE AND BIOENGINEERING, 2008, 105 (03) : 273 - 281
  • [50] On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters
    Papayiannis, G. I.
    Domazakis, G. N.
    Drivaliaris, D.
    Koukoulas, S.
    Tsekrekos, A. E.
    Yannacopoulos, A. N.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (13) : 2569 - 2594