Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs

被引:7
作者
Chen, Lu [1 ]
Gao, Yunjun [1 ]
Huang, Xingrui [1 ]
Jensen, Christian S. [2 ]
Zheng, Bolong [3 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Zhejiang, Peoples R China
[2] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
基金
国家重点研发计划;
关键词
Clustering algorithms; Computational modeling; Distributed databases; Clustering methods; Blogs; Social networking (online); Scalability; Heterogeneous graph; clustering; distributed processing; algorithm; ENERGY-EFFICIENT; FRAMEWORK;
D O I
10.1109/TKDE.2020.3047631
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarities. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. The rapidly growing volume of data nowadays challenges traditional clustering algorithms, and thus, a distributed method is required. Hence, we adopt a popular distributed graph computing system Blogel, based on which, we develop four exact and approximate approaches that enable efficient PPR score computation when edge weights are updated. To improve the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insights into the effectiveness and efficiency of our proposals.
引用
收藏
页码:4781 / 4796
页数:16
相关论文
共 54 条
[1]  
Andersen R, 2007, LECT NOTES COMPUT SC, V4863, P150
[2]  
[Anonymous], 2013, Proc. Adv. Neural Inf. Process. Syst.
[3]  
[Anonymous], 2013, P 2013 ACM SIGMOD IN, DOI [10.1145/2463676.2467799, DOI 10.1145/2463676.2467799]
[4]   Real-Time Multi-Criteria Social Graph Partitioning: A Game Theoretic Approach [J].
Armenatzoglou, Nikos ;
Huy Pham ;
Ntranos, Vasilis ;
Papadias, Dimitris ;
Shahabi, Cyrus .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1617-1628
[5]  
Baroni Alessandro, 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), P369, DOI 10.1145/3110025.3110030
[6]  
Boden Brigitte, 2014, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2014. Proceedings: LNCS 8724, P149, DOI 10.1007/978-3-662-44848-9_10
[7]   Clustering attributed graphs: Models, measures and methods [J].
Bothorel, Cecile ;
Cruz, Juan David ;
Magnani, Matteo ;
Micenkova, Barbora .
NETWORK SCIENCE, 2015, 3 (03) :408-444
[8]   Dynamic Cluster Formation Game for Attributed Graph Clustering [J].
Bu, Zhan ;
Li, Hui-Jia ;
Cao, Jie ;
Wang, Zhen ;
Gao, Guangliang .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (01) :328-341
[9]  
Chen J., 2015, Proceedings of the 2015 SIAM International Conference on Data Mining, P424
[10]   Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs [J].
Chen, Lu ;
Gao, Yunjun ;
Zhang, Yuanliang ;
Jensen, Christian S. ;
Zheng, Bolong .
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, :256-267