Multiple clusterings of heterogeneous information networks

被引:4
作者
Wei, Shaowei [1 ]
Yu, Guoxian [1 ,2 ]
Wang, Jun [3 ]
Domeniconi, Carlotta [4 ]
Zhang, Xiangliang [5 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing, Peoples R China
[2] Shandong Univ, Sch Software, Jinan, Peoples R China
[3] Shandong Univ, Joint SDU NTU Ctr Artificial Intelligence Res, Jinan, Peoples R China
[4] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[5] King Abdullah Univ Sci & Technol, Comp Elect & Math Sci & Engn Div, Thuwal, Saudi Arabia
基金
中国国家自然科学基金;
关键词
Multiple clusterings; Heterogeneous information networks; Meta-path; Quality and diversity; Network embedding;
D O I
10.1007/s10994-021-06000-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional clustering algorithms focus on a single clustering result; as such, they cannot explore potential diverse patterns of complex real world data. To deal with this problem, approaches that exploit meaningful alternative clusterings in data have been developed in recent years. Existing algorithms, including single view/multi-view multiple clustering methods, are designed for applications with i.i.d. data samples, and cannot handle the data samples with dependency presented in networks, especially in heterogeneous information networks (HIN). In this paper, we propose a framework (NetMCs) that can explore multiple clusterings in HIN. Specifically, NetMCs adopts a set of meta-path schemes with different semantics on HIN, and considers each meta-path scheme as a base clustering aspect. Guided by the meta-path schemes, NetMCs then introduces a variation of the skip-gram framework that can jointly optimize multiple clustering aspects, and simultaneously obtain the respective embedding representations and individual clusterings therein. To reduce redundancy between alternative clusterings, NetMCs utilizes an explicit regularization term to control the embedding diversity of the same nodes among different clustering aspects. Experiments on benchmark HIN datasets confirm the performance of NetMCs in generating multiple clusterings with high quality and diversity.
引用
收藏
页码:1505 / 1526
页数:22
相关论文
共 49 条
  • [1] Bae E, 2006, IEEE DATA MINING, P53
  • [2] Bailey J, 2014, CH CRC DATA MIN KNOW, P535
  • [3] Caruana R, 2006, IEEE DATA MINING, P107
  • [4] Chen J., 2015, Proceedings of the 2015 SIAM International Conference on Data Mining, P424
  • [5] A Survey on Network Embedding
    Cui, Peng
    Wang, Xiao
    Pei, Jian
    Zhu, Wenwu
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (05) : 833 - 852
  • [6] Cui Y, 2007, IEEE DATA MINING, P133, DOI 10.1109/ICDM.2007.94
  • [7] Convex and Semi-Nonnegative Matrix Factorizations
    Ding, Chris
    Li, Tao
    Jordan, Michael I.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (01) : 45 - 55
  • [8] metapath2vec: Scalable Representation Learning for Heterogeneous Networks
    Dong, Yuxiao
    Chawla, Nitesh V.
    Swami, Ananthram
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 135 - 144
  • [9] Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts
    Epasto, Alessandro
    Perozzi, Bryan
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 394 - 404
  • [10] node2vec: Scalable Feature Learning for Networks
    Grover, Aditya
    Leskovec, Jure
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 855 - 864