Asyn-SimRank: An asynchronous large-scale simrank algorithm

被引：0

作者：

Wang, Chunlei ^{[1
]}

Zhang, Yanfeng ^{[2
]}

Bao, Yubin ^{[1
]}

Zhao, Changkuan ^{[2
]}

Yu, Ge ^{[1
]}

Gao, Lixin ^{[3
]}

机构：

[1] Institute of Computer Software, College of Information Science and Engineering, Northeastern University, Shenyang

[2] Computer Center, Northeastern University, Shenyang

[3] Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2015年 / 52卷 / 07期

关键词：

Asyn-SimRank; Asynchronous computation; Big data; Iterative computation; Maiter; MapReduce; Similarity;

D O I：

10.7544/issn1000-1239.2015.20140313

中图分类号：

学科分类号：

摘要：

The SimRank algorithm, which exploits network structure to measure the similarity between node pairs, has been widely used in many areas, such as online social networks and link prediction. In recent years, with the development of big data, the input data set of the SimRank algorithm is constantly increasing. People are utilizing distributed computing models, such as MapReduce, to design large-scale SimRank algorithm for solving the big data problems. However, since SimRank algorithm contains a high-cost iterative process with synchronization barriers between iterations and the computational complexity is high in each iteration, the large-scale SimRank computation does not result in the satisfactory performance. In this paper, 1) We propose Asyn-SimRank by employing the iterate-cumulate approach,which asynchronously executes the core iterative computation to avoid the high-cost synchronization barriers in large-scale distributed environments, and effectively reduces the amount of computation and communication. 2) We further propose the keypoint priority scheduling mechanism to accelerate convergence. 3) We prove the accuracy and convergence property of Asyn-SimRank as well as the efficiency of the keypoint priority scheduling. 4) We then implement Asyn-SimRank on Maiter, which is a distributed framework supporting asynchronous iteration. Expremental results show that, compared with the SimRank and Delta-SimRank implementing on Hadoop and Spark, the large-scale Asyn-SimRank significantly promotes the computational efficiency and accelerates the convergence. ©, 2015, Science Press. All right reserved.

引用

页码：1567 / 1579

页数：12

共 12 条

[1] Jeh G., Widom J., Simrank: A measure of structural-sontext similarity, Proc of the 8th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pp. 538-543, (2002)
[2] Dean J., Ghemawat S., Et al., MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51, 1, pp. 107-113, (2004)
[3] Shvachko K., Kuang H., Et al., The hadoop distributed file system, Proc of the 2010 IEEE 26th Symp on Mass Storage Systems and Technologies, pp. 1-10, (2010)
[4] Cao L., Cho B., Tsai M., Et al., Delta-SimRank computing on mapreduce, Proc of the 1st Int Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms Systems Programming Models and Applications, pp. 28-35, (2012)
[5] Zhang Y., Gao Q., Et al., Accelerate large-scale iterative computation through asynchronous accumulative updates, Proc of the 3rd Workshop on Scientific Cloud Computing, pp. 13-22, (2012)
[6] Zaharia M., Chowdhury M., Franklin M.J., Et al., Spark: Cluster computing with working sets, Proc of the 2nd USENIX Conf on Hot Topics in Cloud Computing, (2010)
[7] Bu Y., Howe B., Balazinska M., Et al., Haloop: Efficient iterative data processing on large clusters, VLDB Endowment, 3, 1-2, pp. 285-296, (2010)
[8] Zhang Y., Gao Q., Et al., IMapReduce: A distributed computing framework for iterative computation, Proc of the 2011 IEEE Int Symp on Parallel and Distributed Processing, pp. 1112-1121, (2011)
[9] Zhang Y., Gao Q., Et al., PrIter: A distributed framework for prioritized iterative computations, Proc of the 2nd ACM Symp on Cloud Computing, pp. 13-14, (2011)
[10] Zhang Y., Li C., Et al., Fast simrank computation over disk-resident graphs, Proc of the 18th Int Conf on Database Systems for Advanced Applications, pp. 16-30, (2013)

← 1 2 →