A survey of clustering algorithms for an industrial context

被引:62
作者
Benabdellah, Abla Chaouni [1 ]
Benghabrit, Asmaa [2 ]
Bouhaddou, Imane [1 ]
机构
[1] Moulay Ismail Univ, LM21 Lab ENSAM, Meknes, Morocco
[2] Mohamed V Univ, ENSMR, LMAID Lab, Rabat, Morocco
来源
SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2018) | 2019年 / 148卷
关键词
Clustering algorithms; Unsupervised learning; Sparse dataset; Aircraft; Automotive; Logistics; Industrial datasets; EM ALGORITHM; OPTIMIZATION; WORKERS;
D O I
10.1016/j.procs.2019.01.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Across a wide variety of fields and especially for industrial companies, data are being collected and accumulated at a dramatic pace from many different resources and services. Hence, there is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information from the rapidly growing volumes of digital data. A well-known fundamental task of data mining to extract information is clustering. However, with the modified applications for various domains, several researchers have developed and have provided many clustering algorithms. This complexity makes it difficult for researchers and practitioners to keep up with clustering algorithms development. As a result, finding appropriate algorithms helps significantly to organize information and extract the correct answer from different queries of the databases. In this respect, the aim of this paper is to find the appropriate clustering algorithm for sparse industrial dataset. To achieve this goal, we first present related work that focus on comparing different clustering algorithms over the past twenty years. After that, we provide a categorization of different clustering algorithms found in the literature by matching their properties to the 4V's challenges of Big data which allow us to select the candidate clustering algorithm. Finally, using internal validity indices, K-means, agglomerative hierarchical, DBSCAN and SOM have been implemented and compared on four datasets. In addition, we highlighted the best performing clustering algorithm that gives us the efficient clusters for each dataset. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:291 / 302
页数:12
相关论文
共 79 条
[1]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[2]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[3]  
[Anonymous], COMP SYST APPL AICCS
[4]  
[Anonymous], EXAMINATION PROCEDUR
[5]  
[Anonymous], 1996, SIGMOD REC ACM SPEC, DOI DOI 10.1145/235968.233324
[6]  
[Anonymous], LECT NOTES COMPUT<D>
[7]  
[Anonymous], ASURVEY DATA MINING
[8]  
[Anonymous], 2007, APPL RES COMPUT
[9]  
[Anonymous], 1999, OPTIMAL GRID CLUSTER
[10]  
[Anonymous], INT J COMPUTER SCI I