A survey of clustering algorithms for an industrial context

被引：62

作者：

Benabdellah, Abla Chaouni ^{[1
]}

Benghabrit, Asmaa ^{[2
]}

Bouhaddou, Imane ^{[1
]}

机构：

[1] Moulay Ismail Univ, LM21 Lab ENSAM, Meknes, Morocco

[2] Mohamed V Univ, ENSMR, LMAID Lab, Rabat, Morocco

来源：

SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2018) | 2019年 / 148卷

关键词：

Clustering algorithms; Unsupervised learning; Sparse dataset; Aircraft; Automotive; Logistics; Industrial datasets; EM ALGORITHM; OPTIMIZATION; WORKERS;

D O I：

10.1016/j.procs.2019.01.022

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Across a wide variety of fields and especially for industrial companies, data are being collected and accumulated at a dramatic pace from many different resources and services. Hence, there is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information from the rapidly growing volumes of digital data. A well-known fundamental task of data mining to extract information is clustering. However, with the modified applications for various domains, several researchers have developed and have provided many clustering algorithms. This complexity makes it difficult for researchers and practitioners to keep up with clustering algorithms development. As a result, finding appropriate algorithms helps significantly to organize information and extract the correct answer from different queries of the databases. In this respect, the aim of this paper is to find the appropriate clustering algorithm for sparse industrial dataset. To achieve this goal, we first present related work that focus on comparing different clustering algorithms over the past twenty years. After that, we provide a categorization of different clustering algorithms found in the literature by matching their properties to the 4V's challenges of Big data which allow us to select the candidate clustering algorithm. Finally, using internal validity indices, K-means, agglomerative hierarchical, DBSCAN and SOM have been implemented and compared on four datasets. In addition, we highlighted the best performing clustering algorithm that gives us the efficient clusters for each dataset. (C) 2019 The Authors. Published by Elsevier B.V.

引用

页码：291 / 302

页数：12

共 79 条

[1]

Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314

[2]

Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49

[3]

[Anonymous], COMP SYST APPL AICCS

[4]

[Anonymous], EXAMINATION PROCEDUR

[5]

[Anonymous], 1996, SIGMOD REC ACM SPEC, DOI DOI 10.1145/235968.233324

[6]

[Anonymous], LECT NOTES COMPUT<D>

[7]

[Anonymous], ASURVEY DATA MINING

[8]

[Anonymous], 2007, APPL RES COMPUT

[9]

[Anonymous], 1999, OPTIMAL GRID CLUSTER

[10]

[Anonymous], INT J COMPUTER SCI I

← 1 2 3 4 5 6 7 8 →