A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering

被引:28
作者
Lund, Brady [1 ]
Ma, Jinxuan [1 ]
机构
[1] Emporia State Univ, Emporia, KS 66801 USA
关键词
Clustering; Library and information science; Research methods; Cluster analysis; Data analysis; K-means;
D O I
10.1108/PMM-05-2021-0026
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose - This literature review explores the definitions and characteristics of cluster analysis, a machine-learning technique that is frequently implemented to identify groupings in big datasets and its applicability to library and information science (LIS) research. This overview is intended for researchers who are interested in expanding their data analysis repertory to include cluster analysis, rather than for existing experts in this area. Design/methodology/approach - A review of LIS articles included in the Library and Information Source (EBSCO) database that employ cluster analysis is performed. An overview of cluster analysis in general (how it works from a statistical standpoint, and how it can be performed by researchers), the most popular cluster analysis techniques and the uses of cluster analysis in LIS is presented. Findings - The number of LIS studies that employ a cluster analytic approach has grown from about 5 per year in the early 2000s to an average of 35 studies per year in the mid- and late-2010s. The journal Scientometrics has the most articles published within LIS that use cluster analysis (102 studies). Scientometrics is the most common subject area to employ a cluster analytic approach (152 studies). The findings of this review indicate that cluster analysis could make LIS research more accessible by providing an innovative and insightful process of knowledge discovery. Originality/value - This review is the first to present cluster analysis as an accessible data analysis approach, specifically from an LIS perspective.
引用
收藏
页码:161 / 173
页数:13
相关论文
共 50 条
  • [41] Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering
    Zhou, Kaile
    Yang, Shanlin
    PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (01) : 455 - 466
  • [42] Improved research to k-means initial cluster centers
    Zhang Min
    Duan Kai-fei
    2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 348 - 352
  • [43] Clustering Research on Ship Fault Phenomena Based on K-means Algorithm
    Wei, Guo-dong
    Luo, Zhong
    Yu, Xiang
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 4412 - 4415
  • [44] A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data
    Dierckens, Karl E.
    Harrison, Adrian B.
    Leung, Carson K.
    Pind, Adrienne V.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 925 - 932
  • [45] On the Added Value of Bootstrap Analysis for K-Means Clustering
    Joeri Hofmans
    Eva Ceulemans
    Douglas Steinley
    Iven Van Mechelen
    Journal of Classification, 2015, 32 : 268 - 284
  • [46] Analysis of K-means clustering for Human Capital Trends
    Sharma, Gamini
    Sharma, Manish Kumar
    Sharma, Dakshata
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ICT IN BUSINESS INDUSTRY & GOVERNMENT (ICTBIG), 2016,
  • [47] A multiple k-means cluster ensemble framework for clustering citation trajectories
    Chakraborty, Joyita
    Pradhan, Dinesh K.
    Nandi, Subrata
    JOURNAL OF INFORMETRICS, 2024, 18 (02)
  • [48] Seeding Cluster centers of K-means Clustering through Median projection
    Suresh, L.
    Simha, Jay B.
    Velur, Rajappa
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 217 - 222
  • [49] Clique partitioning for clustering:: A comparison with K-means and latent class analysis
    Wang, Haibo
    Obremski, Tom
    Alidaee, Bahram
    Kochenberger, Gary
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (01) : 1 - 13
  • [50] Multimorbidity patterns with K-means nonhierarchical cluster analysis
    Concepción Violán
    Albert Roso-Llorach
    Quintí Foguet-Boreu
    Marina Guisado-Clavero
    Mariona Pons-Vigués
    Enriqueta Pujol-Ribera
    Jose M. Valderas
    BMC Family Practice, 19