Estimating the number of clusters in a ranking data context

被引:3
|
作者
Calmon, Wilson [1 ]
Albi, Mariana [1 ]
机构
[1] Fluminense Fed Univ, Inst Math & Stat, BR-24210201 Niteroi, RJ, Brazil
关键词
Number of clusters; Ranking data; Plackett-Luce; Clustering; Ordinal classification;
D O I
10.1016/j.ins.2020.09.056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study introduces two methods for estimating the number of clusters specially designed to identify the number of groups in a finite population of objects or items ranked by several judges under the assumption that these judges belong to a homogeneous population. The proposed methods are both based on a hierarchical version of the classical Plackett-Luce model in which the number of clusters is set as an additional parameter. These methods do not require continuous score data to be available or restrict the number of clusters to be greater than one or less than the total number of objects, thereby enabling their application in a wide range of scenarios. The results of a large simulation study suggest that the proposed methods outperform well-established methodologies (Calinski & Harabasz, gap, Hartigan, Krzanowski & Lai, jump, and silhouette) as well as some recently proposed approaches (instability, quantization error modeling, slope, and utility). They realize the highest percentages of correct estimates of the number of clusters and the smallest errors compared with these well-established methodologies. We illustrate the proposed methods by analyzing a ranking dataset obtained from Formula One motor racing. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:977 / 995
页数:19
相关论文
共 50 条
  • [1] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [2] Estimating the number of clusters in a numerical data set via quantization error modeling
    Kolesnikov, Alexander
    Trichina, Elena
    Kauranne, Tuomo
    PATTERN RECOGNITION, 2015, 48 (03) : 941 - 952
  • [3] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
  • [4] Sequential clustering with particle filters - Estimating the number of clusters from data
    Schubert, J
    Sidenbladh, H
    2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
  • [5] Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
    Dinh, Duy-Tai
    Fujinami, Tsutomu
    Huynh, Van-Nam
    KNOWLEDGE AND SYSTEMS SCIENCES, KSS 2019, 2019, 1103 : 1 - 17
  • [6] Estimating the number of clusters in microarray data sets based on an information theoretic criterion
    Nicorici, Daniel
    Astola, Jaakko
    Yli-Harja, Olli
    2005 IEEE/SP 13TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), VOLS 1 AND 2, 2005, : 936 - 940
  • [7] RSQRT: An heuristic for estimating the number of clusters to report
    Carlis, John
    Bruso, Kelsey
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2012, 11 (02) : 152 - 158
  • [8] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Ruby, Rukhsana
    Wu, Kaishun
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [9] A hierarchical Gamma Mixture Model-based method for estimating the number of clusters in complex data
    Azhar, Muhammad
    Huang, Joshua Zhexue
    Masud, Md Abdul
    Li, Mark Junjie
    Cui, Laizhong
    APPLIED SOFT COMPUTING, 2020, 87 (87)
  • [10] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mohammad Sultan Mahmud
    Joshua Zhexue Huang
    Rukhsana Ruby
    Kaishun Wu
    Journal of Big Data, 10