Efficient SimRank-based Similarity Join Over Large Graphs

被引:43
|
作者
Zheng, Weiguo [1 ]
Zou, Lei [1 ]
Feng, Yansong [1 ]
Chen, Lei [2 ]
Zhao, Dongyan [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 07期
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.14778/2536349.2536350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality. Note that "SimRank" is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution.
引用
收藏
页码:493 / 504
页数:12
相关论文
共 50 条
  • [1] Efficient SimRank-Based Similarity Join
    Zheng, Weiguo
    Zou, Lei
    Chen, Lei
    Zhao, Dongyan
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (03):
  • [2] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1603 - 1604
  • [3] Efficient Top-K SimRank-based Similarity Join
    Tao, Wenbo
    Yu, Minghe
    Li, Guoliang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 317 - 328
  • [4] An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs
    Shao, Yingxia
    Cui, Bin
    Chen, Lei
    Liu, Mingming
    Xie, Xing
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (08): : 838 - 849
  • [5] Efficient and Accurate SimRank-based Similarity Joins: Experiments, Analysis, and Improvement
    Ge, Qian
    Liu, Yu
    Zhao, Yinghao
    Sun, Yuetian
    Zou, Lei
    Chen, Yuxing
    Pan, Anqun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (04): : 617 - 629
  • [6] An Experimental Evaluation of SimRank-based Similarity Search Algorithms
    Zhang, Zhipeng
    Shao, Yingxia
    Cui, Bin
    Zhang, Ce
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (05): : 601 - 612
  • [7] Efficient similarity join for certain graphs
    Ruan, Qunsheng
    Wu, Qingfeng
    Liu, Xiling
    Miao, Fengyu
    Wang, Yingdong
    MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2021, 27 (04): : 1665 - 1685
  • [8] Efficient similarity join for certain graphs
    Qunsheng Ruan
    Qingfeng Wu
    Xiling Liu
    Fengyu Miao
    Yingdong Wang
    Microsystem Technologies, 2021, 27 : 1665 - 1685
  • [9] Efficient index-free SimRank similarity search in large graphs by discounting path lengths
    Zhang, Mingxi
    Yang, Liuqian
    Hu, Hangfei
    Liu, Tianxing
    Wang, Jinhua
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
  • [10] Fast top-k similarity join for SimRank
    Li, Ruiqi
    Zhao, Xiang
    Shang, Haichuan
    Chen, Yifan
    Xiao, Weidong
    INFORMATION SCIENCES, 2017, 381 : 1 - 19