Efficient SimRank-based Similarity Join Over Large Graphs

被引:44
作者
Zheng, Weiguo [1 ]
Zou, Lei [1 ]
Feng, Yansong [1 ]
Chen, Lei [2 ]
Zhao, Dongyan [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 07期
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.14778/2536349.2536350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality. Note that "SimRank" is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.
引用
收藏
页码:493 / 504
页数:12
相关论文
共 50 条
[21]   Efficient similarity join of large sets of moving object trajectories [J].
Ding, Hui ;
Trajcevski, Goce ;
Scheuermann, Peter .
TIME 2008: 15TH INTERNATIONAL SYMPOSIUM ON TEMPORAL REPRESENTATION AND REASONING, PROCEEDINGS, 2008, :79-87
[22]   Efficient and Effective Similarity Search over Bipartite Graphs [J].
Yang, Renchi .
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, :308-318
[23]   Highly Efficient String Similarity Search and Join over Compressed Indexes [J].
Xiao, Guorui ;
Wang, Jin ;
Lin, Chunbin ;
Zaniolo, Carlo .
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, :232-244
[24]   Towards Efficient SimRank Computation on Large Networks [J].
Yu, Weiren ;
Lin, Xuemin ;
Zhang, Wenjie .
2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, :601-612
[25]   Efficient Parallel Processing of Distance Join Queries Over Distributed Graphs [J].
Zhang, Xiaofei ;
Chen, Lei ;
Wang, Min .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) :740-754
[26]   Sig-SR: SimRank Search over Singular Graphs [J].
Yu, Weiren ;
McCann, Julie A. .
SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, :859-862
[27]   Exact Single-Source SimRank Computation on Large Graphs [J].
Wang, Hanzhi ;
Wei, Zhewei ;
Yuan, Ye ;
Du, Xiaoyong ;
Wen, Ji-Rong .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :653-663
[28]   An Efficient Similarity Join Algorithm with Cosine Similarity Predicate [J].
Lee, Dongjoo ;
Park, Jaehui ;
Shim, Junho ;
Lee, Sang-goo .
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT 2, 2010, 6262 :422-+
[29]   Scalable Single-source SimRank Computation for Large Graphs [J].
Gao, Xingkun ;
Bao, Nianyuan ;
Liu, Jie ;
Tang, Jie ;
Wu, Gangshan .
2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, :1083-1091
[30]   Accelerating pairwise SimRank estimation over static and dynamic graphs [J].
Yue Wang ;
Lei Chen ;
Yulin Che ;
Qiong Luo .
The VLDB Journal, 2019, 28 :99-122