EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM

被引:3
作者
Alamgir, Zareen [1 ]
Naveed, Hina [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Comp Sci Dept, Lahore, Pakistan
关键词
Clustering; K-means; density-based K-means; EDK-means; partitional clustering; K-MEANS; INITIALIZATION; STABILITY;
D O I
10.31577/cai_2021_6_1322
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important data mining technique that helps to detect hidden structures and patterns in the data. K-means algorithm is one of the most popular and widely used partitional clustering algorithms. It is a simple and efficient method but has several shortcomings. One major drawback of traditional K-means is that it selects initial centroids randomly, resulting in low-quality clusters. Various K-means extensions are designed to solve the issue of the random initial centroid. A novel density-based K-means (DK-means) algorithm is recently proposed that uses density-parameters for selecting initial centroids. It outperforms K-means in terms of accuracy at the cost of time. In this research, we present an efficient density-based K-means algorithm (EDK-means) that uses advance data structures and significantly reduces the DK-means algorithm's execution time. Furthermore, we rigorously evaluated the performance of density-based K-means on different challenging real-world datasets and compared it with traditional K-means. The experimental results are promising and show that density-based K-means outperforms K-means. It converges more rapidly than basic K-means, and it works well for the datasets with different cluster sizes.
引用
收藏
页码:1322 / 1344
页数:23
相关论文
共 24 条
[1]   Cuckoo, Bat and Krill Herd based k-means plus plus clustering algorithms [J].
Aggarwal, Shruti ;
Singh, Paramvir .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6) :14169-14180
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]   Stability of k-means clustering [J].
Ben-David, Shai ;
Pal, Ddvid ;
Simon, Hans Ulrich .
LEARNING THEORY, PROCEEDINGS, 2007, 4539 :20-+
[4]  
Bhasin A, 2019, CREDIT CARD DATASET
[5]   Gabriel graph-based connectivity and density for internal validity of clustering [J].
Boudane, Fatima ;
Berrichi, Ali .
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2020, 9 (03) :221-238
[6]   HOW THE INITIALIZATION AFFECTS THE STABILITY OF THE k-MEANS ALGORITHM [J].
Bubeck, Sebastien ;
Meila, Marina ;
von Luxburg, Ulrike .
ESAIM-PROBABILITY AND STATISTICS, 2012, 16 :436-452
[7]  
Dua D, UCI MACHINE LEARNING
[8]  
Ester M., 1996, PROC 2 INT C KNOWLED, P226, DOI DOI 10.5555/3001460.3001507
[9]   A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis [J].
Fahad, Adil ;
Alshatri, Najlaa ;
Tari, Zahir ;
Alamri, Abdullah ;
Khalil, Ibrahim ;
Zomaya, Albert Y. ;
Foufou, Sebti ;
Bouras, Abdelaziz .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) :267-279
[10]   How much can k-means be improved by using better initialization and repeats? [J].
Franti, Pasi ;
Sieranoja, Sami .
PATTERN RECOGNITION, 2019, 93 :95-112