HeteRank: A general similarity measure in heterogeneous information networks by integrating multi-type relationships

被引:20
作者
Zhang, Mingxi [1 ]
Wang, Jinhua [2 ]
Wang, Wei [2 ]
机构
[1] Univ Shanghai Sci & Technol, Shanghai, Peoples R China
[2] Fudan Univ, Shanghai, Peoples R China
基金
上海市自然科学基金;
关键词
Similarity computation; HeteRank; Information network; SEARCH; RECOMMENDATION;
D O I
10.1016/j.ins.2018.04.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With heterogeneous information networks becoming ubiquitous and complex, lots of data mining tasks have been explored, including clustering, collaborative filtering and link prediction. Similarity computation is a fundamental task required for many problems of data mining. Although a large amount of similarity measures are developed for assessing similarities in heterogeneous networks, they are usually dependent on the network schema and lack a general manner for integrating kinds of relationships between objects. In this paper, we propose a similarity measure, namely HeteRank, for generally computing similarities in heterogeneous information networks. The relationships between different type objects are represented by a general relationship matrix (GRM) that is built based on the scales of different type objects. Based on GRM, HeteRank fully integrates the multi-type relationships into similarity computation by utilizing all the meetings between objects. The HeteRank equation is further transformed into a simple binomial expression form with considering restart probability. For efficiently computing HeteRank similarities, we divide the similarity computation into two steps: the first step is to compute the intermediate values, and the second step is to compute the similarities based on intermediate values. And then we approximate HeteRank equation by setting thresholds for skipping lower intermediate values and similarity scores. A pruning algorithm is developed to reduce the unnecessary visits, multiplications and additions that make little contribution during similarity computation. Extensive experiments on real datasets demonstrate the effectiveness and efficiency of HeteRank through comparing with the state-of-the-art similarity measures. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:389 / 407
页数:19
相关论文
共 52 条
[31]   A Survey of Heterogeneous Information Network Analysis [J].
Shi, Chuan ;
Li, Yitong ;
Zhang, Jiawei ;
Sun, Yizhou ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (01) :17-37
[32]   HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks [J].
Shi, Chuan ;
Kong, Xiangnan ;
Huang, Yue ;
Yu, Philip S. ;
Wu, Bin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (10) :2479-2492
[33]  
Sun YZ, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P797
[34]  
Sunt YZ, 2011, PROC VLDB ENDOW, V4, P992
[35]   Ontology-based approach for measuring semantic similarity [J].
Taieb, Mohamed Ali Hadj ;
Ben Aouicha, Mohamed ;
Ben Hamadou, Abdelmajid .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 36 :238-261
[36]   Topic level expertise search over heterogeneous networks [J].
Tang, Jie ;
Zhang, Jing ;
Jin, Ruoming ;
Yang, Zi ;
Cai, Keke ;
Zhang, Li ;
Su, Zhong .
MACHINE LEARNING, 2011, 82 (02) :211-237
[37]   A Combination Approach to Web User Profiling [J].
Tang, Jie ;
Yao, Limin ;
Zhang, Duo ;
Zhang, Jing .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2010, 5 (01)
[38]   Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks [J].
Wang, Chenguang ;
Song, Yangqiu ;
Li, Haoran ;
Sun, Yizhou ;
Zhang, Ming ;
Han, Jiawei .
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, :1629-1638
[39]  
Wang Guan., 2012, CIKM, P1462
[40]  
Xi W., 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P130, DOI 10.1145/1076034.1076059