Efficient SimRank-based Similarity Join Over Large Graphs

被引:44
作者
Zheng, Weiguo [1 ]
Zou, Lei [1 ]
Feng, Yansong [1 ]
Chen, Lei [2 ]
Zhao, Dongyan [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 07期
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.14778/2536349.2536350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality. Note that "SimRank" is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.
引用
收藏
页码:493 / 504
页数:12
相关论文
共 50 条
[31]   Scalable Single-source SimRank Computation for Large Graphs [J].
Gao, Xingkun ;
Bao, Nianyuan ;
Liu, Jie ;
Tang, Jie ;
Wu, Gangshan .
2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, :1083-1091
[32]   Accelerating pairwise SimRank estimation over static and dynamic graphs [J].
Wang, Yue ;
Chen, Lei ;
Che, Yulin ;
Luo, Qiong .
VLDB JOURNAL, 2019, 28 (01) :99-122
[33]   Similarity Join over Array Data [J].
Zhao, Weijie ;
Rusu, Florin ;
Dong, Bin ;
Wu, Kesheng .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :2007-2022
[34]   Fast-Join: An Efficient Method for Fuzzy Token Matching based String Similarity Join [J].
Wang, Jiannan ;
Li, Guoliang ;
Fe, Jianhua .
IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, :458-469
[35]   Fast-join: An efficient method for fuzzy token matching based string similarity join [J].
Wang, Jiannan ;
Li, Guoliang ;
Fe, Jianhua .
Proceedings - International Conference on Data Engineering, 2011, :458-469
[36]   Taming Computational Complexity: Efficient and Parallel SimRank Optimizations on Undirected Graphs [J].
Yu, Weiren ;
Lin, Xuemin ;
Le, Jiajin .
WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 :280-+
[37]   FrepJoin: an efficient partition-based algorithm for edit similarity join [J].
Ji-zhou Luo ;
Sheng-fei Shi ;
Hong-zhi Wang ;
Jian-zhong Li .
Frontiers of Information Technology & Electronic Engineering, 2017, 18 :1499-1510
[38]   FrepJoin:an efficient partition-based algorithm for edit similarity join [J].
Jizhou LUO ;
Shengfei SHI ;
Hongzhi WANG ;
Jianzhong LI .
FrontiersofInformationTechnology&ElectronicEngineering, 2017, 18 (10) :1499-1510
[39]   FrepJoin: an efficient partition-based algorithm for edit similarity join [J].
Luo, Ji-zhou ;
Shi, Sheng-fei ;
Wang, Hong-zhi ;
Li, Jian-zhong .
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2017, 18 (10) :1499-1510
[40]   Para-Join: An efficient parallel method for string similarity join [J].
Yan, Cairong ;
Wang, Jian ;
Zhu, Bin ;
Guo, Wenjing .
International Journal of High Performance Computing and Networking, 2017, 10 (4-5) :381-390