A fast clustering algorithm based on embedding technology for heterogeneous information networks

被引:0
|
作者
Chen, Li-Min [1 ,2 ]
Yang, Jing [1 ]
Zhang, Jian-Pei [1 ]
机构
[1] Institute of Computer Science and Technology, Harbin Engineering University, Harbin
[2] Institute of Computer Science and Technology, Mudanjiang Normal University, Mudanjiang
来源
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology | 2015年 / 37卷 / 11期
基金
中国国家自然科学基金;
关键词
Clustering; Commute distance; Embedding; Heterogeneous information network; Sum of weighted distances;
D O I
10.11999/JEIT150106
中图分类号
学科分类号
摘要
Research on clustering heterogeneous information networks is one of the current hotspots. Taking advantages of the sparsity of heterogeneous information networks, a fast clustering algorithm based on embedding technology for heterogeneous information networks of star network schema is proposed in this paper. First, the heterogeneous information network is transformed into some compatible bipartite graphs from the point of compatible view. Then, the approximate commute distance embedding of each bipartite graph is computed via random mapping and a linear time solver, and an indicator subset in each embedding indicates the target dataset. At last, a general model is formulated via all the indicator subsets, and a minimum value of the model is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. This proposed algorithm is effective by theory analysis and experimental verification. © 2015, Science Press. All right reserved.
引用
收藏
页码:2634 / 2641
页数:7
相关论文
共 22 条
  • [1] Xiao J.-B., Zhang S.-W., An algorithm of integrating random walk and increment correlative vertexes for mining community of dynamic networks, Journal of Electronics & Information Technology, 35, 4, pp. 977-981, (2013)
  • [2] Chen J.-M., Chen J.-J., Liu J., Et al., Clustering algorithms for large-scale social networks based on structural similarity, Journal of Electronics & Information Technology, 37, 2, pp. 449-454, (2015)
  • [3] Sun Y., Han J., Mining heterogeneous information networks: principles and methodologies, Proceedings of Mining Heterogeneous Information Networks: Principles and Methodologies, 3, 2, pp. 1-159, (2012)
  • [4] Huang Y., Gao X., Clustering on heterogeneous networks, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4, 3, pp. 213-233, (2014)
  • [5] Gao B., Liu T.Y., Zheng X., Et al., Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering, Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 41-50, (2005)
  • [6] Gao B., Liu T., Ma W.-Y., Star-structured high-order heterogeneous data co-clustering based on consistent information theory, Proceedings of the 6th International Conference on Data Mining (ICDM 2006), pp. 880-884, (2006)
  • [7] Long B., Zhang Z.M., Wu X., Et al., Spectral clustering for multi-type relational data, Proceedings of the 23rd International Conference on Machine Learning, pp. 585-592, (2006)
  • [8] Sun Y., Yu Y., Han J., Ranking-based clustering of heterogeneous information networks with star network schema, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797-806, (2009)
  • [9] Li P., Wen J., Li X., SNTClus: a novel service clustering algorithm based on network analysis and service tags, Przeglad Elektrotechniczny, 89, 1, pp. 208-210, (2013)
  • [10] Li P., Chen L., Li X., Et al., RNRank: Network-Based Ranking on Relational Tuples, pp. 139-150, (2013)