Estimating the number of clusters in a ranking data context

被引:3
|
作者
Calmon, Wilson [1 ]
Albi, Mariana [1 ]
机构
[1] Fluminense Fed Univ, Inst Math & Stat, BR-24210201 Niteroi, RJ, Brazil
关键词
Number of clusters; Ranking data; Plackett-Luce; Clustering; Ordinal classification;
D O I
10.1016/j.ins.2020.09.056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study introduces two methods for estimating the number of clusters specially designed to identify the number of groups in a finite population of objects or items ranked by several judges under the assumption that these judges belong to a homogeneous population. The proposed methods are both based on a hierarchical version of the classical Plackett-Luce model in which the number of clusters is set as an additional parameter. These methods do not require continuous score data to be available or restrict the number of clusters to be greater than one or less than the total number of objects, thereby enabling their application in a wide range of scenarios. The results of a large simulation study suggest that the proposed methods outperform well-established methodologies (Calinski & Harabasz, gap, Hartigan, Krzanowski & Lai, jump, and silhouette) as well as some recently proposed approaches (instability, quantization error modeling, slope, and utility). They realize the highest percentages of correct estimates of the number of clusters and the smallest errors compared with these well-established methodologies. We illustrate the proposed methods by analyzing a ranking dataset obtained from Formula One motor racing. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:977 / 995
页数:19
相关论文
共 50 条
  • [21] Estimating the number of clusters using multivariate location test statistics
    Choi, Kyungmee
    Kim, Deok-Hwan
    Choi, Taeryon
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 373 - 382
  • [22] Nbclust: An R Package for Determining the Relevant Number of Clusters in a Data Set
    Charrad, Malika
    Ghazzali, Nadia
    Boiteau, Veronique
    Niknafs, Azam
    JOURNAL OF STATISTICAL SOFTWARE, 2014, 61 (06): : 1 - 36
  • [23] Dual Criteria Determination of the Number of Clusters in Data
    Hua, Kaixun
    Simovici, Dan A.
    2018 20TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2018), 2019, : 201 - 208
  • [24] Selection of the number of clusters in functional data analysis
    Zambom, Adriano Zanin
    Alfonso Collazos, Julian
    Dias, Ronaldo
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (14) : 2980 - 2998
  • [25] AutoElbow: An Automatic Elbow Detection Method for Estimating the Number of Clusters in a Dataset
    Onumanyi, Adeiza James
    Molokomme, Daisy Nkele
    Isaac, Sherrin John
    Abu-Mahfouz, Adnan M.
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [26] DP-Dip: A skinny method for estimating the number and center of clusters
    Xu, Shuaijing
    Bie, Rongfang
    Li, Liangchi
    Yang, Yuqi
    2017 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2018, 129 : 2 - 8
  • [27] Estimating the Number of Clusters as a Pre-processing Step to Unsupervised Learning
    Nietto, Paulo Rogerio
    Nicoletti, Maria do Carmo
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016), 2017, 557 : 25 - 34
  • [28] INCA: New statistic for estimating the number of clusters and identifying atypical units
    Irigoien, I.
    Arenas, C.
    STATISTICS IN MEDICINE, 2008, 27 (15) : 2948 - 2973
  • [29] Estimating the Number of Clusters with Database for Texture Segmentation Using Gabor Filter
    Kim, Minkyu
    Lim, Jeong-Mook
    Shin, Heesook
    Oh, Changmok
    Jeong, Hyun-Tae
    COMPUTER VISION SYSTEMS (ICVS 2015), 2015, 9163 : 435 - 444
  • [30] Determination of the Optimal Number of Clusters in Harmonic Data Classification
    Asheibi, Ali
    Stirling, David
    Sutanto, Danny
    2008 13TH INTERNATIONAL CONFERENCE ON HARMONICS AND QUALITY OF POWER, VOLS 1 AND 2, 2008, : 197 - +