Density-based clustering of big probabilistic graphs

被引:17
|
作者
Halim, Zahid [1 ]
Khattak, Jamal Hussain [1 ,2 ]
机构
[1] Ghulam Ishaq Khan Inst Engn Sci & Technol, Fac Comp Sci & Engn, Topi, Pakistan
[2] Allied Bank Ltd, Informat Technol Grp, Business Solut & Dev, Lahore, Pakistan
关键词
Clustering graphs; Machine learning; Big graphs; Clustering; Community detection; UNCERTAIN DATA; ALGORITHM;
D O I
10.1007/s12530-018-9223-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a machine learning task to group similar objects in coherent sets. These groups exhibit similar behavior with-in their cluster. With the exponential increase in the data volume, robust approaches are required to process and extract clusters. In addition to large volumes, datasets may have uncertainties due to the heterogeneity of the data sources, resulting in the Big Data. Modern approaches and algorithms in machine learning widely use probability-theory in order to determine the data uncertainty. Such huge uncertain data can be transformed to a probabilistic graph-based representation. This work presents an approach for density-based clustering of big probabilistic graphs. The proposed approach deals with clustering of large probabilistic graphs using the graph's density, where the clustering process is guided by the nodes' degree and the neighborhood information. The proposed approach is evaluated using seven real-world benchmark datasets, namely protein-to-protein interaction, yahoo, movie-lens, core, last.fm, delicious social bookmarking system, and epinions. These datasets are first transformed to a graph-based representation before applying the proposed clustering algorithm. The obtained results are evaluated using three cluster validation indices, namely Davies-Bouldin index, Dunn index, and Silhouette coefficient. This proposal is also compared with four state-of-the-art approaches for clustering large probabilistic graphs. The results obtained using seven datasets and three cluster validity indices suggest better performance of the proposed approach.
引用
收藏
页码:333 / 350
页数:18
相关论文
共 50 条
  • [31] TOBAE: A Density-based Agglomerative Clustering Algorithm
    Khalid, Shehzad
    Razzaq, Shahid
    JOURNAL OF CLASSIFICATION, 2015, 32 (02) : 241 - 267
  • [32] Incremental Density-Based Clustering on Multicore Processors
    Mai, Son T.
    Jacobsen, Jon
    Amer-Yahia, Sihem
    Spence, Ivor
    Nhat-Phuong Tran
    Assent, Ira
    Quoc Viet Hung Nguyen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1338 - 1356
  • [33] TOBAE: A Density-based Agglomerative Clustering Algorithm
    Shehzad Khalid
    Shahid Razzaq
    Journal of Classification, 2015, 32 : 241 - 267
  • [34] An Efficient Density-Based Algorithm for Data Clustering
    Theljani, Foued
    Laabidi, Kaouther
    Zidi, Salah
    Ksouri, Moufida
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (04)
  • [35] Performance evaluation of density-based clustering methods
    Aliguliyev, Ramiz M.
    INFORMATION SCIENCES, 2009, 179 (20) : 3583 - 3602
  • [36] A Hybrid Recommendation System Based on Density-Based Clustering
    Tsikrika, Theodora
    Symeonidis, Spyridon
    Gialampoukidis, Ilias
    Satsiou, Anna
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    INTERNET SCIENCE, INSCI 2017, 2018, 10750 : 49 - 57
  • [37] Efficient Distributed Approach for Density-Based Clustering
    Laloux, Jean-Francois
    Le-Khac, Nhien-An
    Kechadi, M-Tahar
    2011 20TH IEEE INTERNATIONAL WORKSHOPS ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2011, : 145 - 150
  • [38] GrDBSCAN: A Granular Density-Based Clustering Algorithm
    Suchy, Dawid
    Siminski, Krzysztof
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2023, 33 (02) : 297 - 312
  • [39] EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM
    Alamgir, Zareen
    Naveed, Hina
    COMPUTING AND INFORMATICS, 2021, 40 (06) : 1322 - 1344
  • [40] Density-based clustering with boundary samples verification
    Peng, Jie
    Chen, Yong
    APPLIED SOFT COMPUTING, 2024, 159