Data clustering: application and trends

被引:72
|
作者
Oyewole, Gbeminiyi John [1 ]
Thopil, George Alex [1 ]
机构
[1] Univ Pretoria, Dept Engn & Technol Management, Pretoria, South Africa
关键词
Clustering; Clustering classification; Clustering components; Industry applications; Clustering algorithms; Clustering trends; PATTERN-CLASSIFICATION; R PACKAGE; ALGORITHMS; SYSTEM; ICT; INFORMATION; EXPLORATION; INDICATORS; CHALLENGES; MANAGEMENT;
D O I
10.1007/s10462-022-10325-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering has primarily been used as an analytical technique to group unlabeled data for extracting meaningful information. The fact that no clustering algorithm can solve all clustering problems has resulted in the development of several clustering algorithms with diverse applications. We review data clustering, intending to underscore recent applications in selected industrial sectors and other notable concepts. In this paper, we begin by highlighting clustering components and discussing classification terminologies. Furthermore, specific, and general applications of clustering are discussed. Notable concepts on clustering algorithms, emerging variants, measures of similarities/dissimilarities, issues surrounding clustering optimization, validation and data types are outlined. Suggestions are made to emphasize the continued interest in clustering techniques both by scholars and Industry practitioners. Key findings in this review show the size of data as a classification criterion and as data sizes for clustering become larger and varied, the determination of the optimal number of clusters will require new feature extracting methods, validation indices and clustering techniques. In addition, clustering techniques have found growing use in key industry sectors linked to the sustainable development goals such as manufacturing, transportation and logistics, energy, and healthcare, where the use of clustering is more integrated with other analytical techniques than a stand-alone clustering technique.
引用
收藏
页码:6439 / 6475
页数:37
相关论文
共 50 条
  • [1] Data clustering: application and trends
    Gbeminiyi John Oyewole
    George Alex Thopil
    Artificial Intelligence Review, 2023, 56 : 6439 - 6475
  • [2] CLUSTERING OF NONNEGATIVE DATA AND AN APPLICATION TO MATRIX COMPLETION
    Strohmeier, C.
    Needell, D.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8349 - 8353
  • [3] XML Data Clustering: An Overview
    Algergawy, Alsayed
    Mesiti, Marco
    Nayak, Richi
    Saake, Gunter
    ACM COMPUTING SURVEYS, 2011, 43 (04)
  • [4] Federated Matrix Factorization: Algorithm Design and Application to Data Clustering
    Wang, Shuai
    Chang, Tsung-Hui
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 1625 - 1640
  • [5] Application of Agglomerative Hierarchical Clustering for Clustering of Time Series Data
    Radovanovic, Ana
    Li, Junshi
    Milanovic, Jovica, V
    Milosavljevic, Nina
    Storchi, Riccardo
    2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE 2020): SMART GRIDS: KEY ENABLERS OF A GREEN POWER SYSTEM, 2020, : 640 - 644
  • [6] Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
    Qiang, Jipeng
    Ding, Wei
    Kuijjer, Marieke
    Quackenbush, John
    Chen, Ping
    IEEE ACCESS, 2020, 8 : 67775 - 67789
  • [7] Application of comparative strainer clustering as a novel method of high volume of data clustering to optimal power flow problem
    Azizi, E.
    Ghaemi, S.
    Mohammadi-Ivatloo, B.
    Piran, Md. Jalil
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2019, 113 : 362 - 371
  • [8] Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM
    Mirzal, Andri
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 1173 - 1192
  • [9] An Overview of Clustering Models with an Application to Document Clustering
    Pauletic, Iva
    Nacinovic Prskalo, Lucia
    Bakaric, Marija Brkic
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1659 - 1664
  • [10] From Alternative Clustering to Robust Clustering and Its Application to Gene Expression Data
    Peng, Peter
    Nagi, Mohamad
    Sair, Omer
    Suleiman, Iyad
    Qabaja, Ala
    ElSheikh, Abdallah M.
    Gao, Shang
    Ozyer, Tansel
    Kianmehr, Keivan
    Naji, Ghada
    Ridley, Mick
    Rokne, Jon
    Alhajj, Reda
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 421 - +