EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM

被引：3

作者：

Alamgir, Zareen ^{[1
]}

Naveed, Hina ^{[1
]}

机构：

[1] Natl Univ Comp & Emerging Sci, Comp Sci Dept, Lahore, Pakistan

来源：

COMPUTING AND INFORMATICS | 2021年 / 40卷 / 06期

关键词：

Clustering; K-means; density-based K-means; EDK-means; partitional clustering; K-MEANS; INITIALIZATION; STABILITY;

D O I：

10.31577/cai_2021_6_1322

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is an important data mining technique that helps to detect hidden structures and patterns in the data. K-means algorithm is one of the most popular and widely used partitional clustering algorithms. It is a simple and efficient method but has several shortcomings. One major drawback of traditional K-means is that it selects initial centroids randomly, resulting in low-quality clusters. Various K-means extensions are designed to solve the issue of the random initial centroid. A novel density-based K-means (DK-means) algorithm is recently proposed that uses density-parameters for selecting initial centroids. It outperforms K-means in terms of accuracy at the cost of time. In this research, we present an efficient density-based K-means algorithm (EDK-means) that uses advance data structures and significantly reduces the DK-means algorithm's execution time. Furthermore, we rigorously evaluated the performance of density-based K-means on different challenging real-world datasets and compared it with traditional K-means. The experimental results are promising and show that density-based K-means outperforms K-means. It converges more rapidly than basic K-means, and it works well for the datasets with different cluster sizes.

引用

页码：1322 / 1344

页数：23

共 24 条

[1] Cuckoo, Bat and Krill Herd based k-means plus plus clustering algorithms [J].

Aggarwal, Shruti ;

Singh, Paramvir .

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6) :14169-14180

[2]

Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027

[3] Stability of k-means clustering [J].

Ben-David, Shai ;

Pal, Ddvid ;

Simon, Hans Ulrich .

LEARNING THEORY, PROCEEDINGS, 2007, 4539 :20-+

[4]

Bhasin A, 2019, CREDIT CARD DATASET

[5] Gabriel graph-based connectivity and density for internal validity of clustering [J].

Boudane, Fatima ;

Berrichi, Ali .

PROGRESS IN ARTIFICIAL INTELLIGENCE, 2020, 9 (03) :221-238

[6] HOW THE INITIALIZATION AFFECTS THE STABILITY OF THE k-MEANS ALGORITHM [J].

Bubeck, Sebastien ;

Meila, Marina ;

von Luxburg, Ulrike .

ESAIM-PROBABILITY AND STATISTICS, 2012, 16 :436-452

[7]

Dua D, UCI MACHINE LEARNING

[8]

Ester M., 1996, PROC 2 INT C KNOWLED, P226, DOI DOI 10.5555/3001460.3001507

[9] A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis [J].

Fahad, Adil ;

Alshatri, Najlaa ;

Tari, Zahir ;

Alamri, Abdullah ;

Khalil, Ibrahim ;

Zomaya, Albert Y. ;

Foufou, Sebti ;

Bouras, Abdelaziz .

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) :267-279

[10] How much can k-means be improved by using better initialization and repeats? [J].

Franti, Pasi ;

Sieranoja, Sami .

PATTERN RECOGNITION, 2019, 93 :95-112

← 1 2 3 →