Bipartite-Oriented Distributed Graph Partitioning for Big Learning

被引:9
作者
Chen, Rong [1 ]
Shi, Jia-Xin [1 ]
Chen, Hai-Bo [1 ]
Zang, Bin-Yu [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Key Lab Scalable Comp & Syst, Inst Parallel & Distributed Syst, Shanghai 200240, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金; 上海市科技启明星计划;
关键词
bipartite graph; graph partitioning; graph-parallel system;
D O I
10.1007/s11390-015-1501-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many machine learning and data mining (MLDM) problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, most distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive replication of vertices as well as significant pressure on network communication. This article identifies the challenges and opportunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partitioning algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on Power Graph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithms, due to reducing up to 80% vertex replication, and up to 96% network traffic.
引用
收藏
页码:20 / 29
页数:10
相关论文
共 21 条
  • [1] [Anonymous], 2014, P APSYS
  • [2] [Anonymous], 2006, P 20 INT C PAR DISTR
  • [3] [Anonymous], 2012, PROC 10 USENIX C OPE
  • [4] The anatomy of a large-scale hypertextual Web search engine
    Brin, S
    Page, L
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 107 - 117
  • [5] Chen R., 2014, Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, P215
  • [6] Chen R, 2013, TECHNICAL REPORT
  • [7] Dhillon I. S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P269, DOI 10.1145/502512.502550
  • [8] Elsosser R., 2001, P 13 ANN ACM S PAR A, P255
  • [9] Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning
    Gao, B
    Liu, TY
    Feng, G
    Qin, T
    Cheng, QS
    Ma, WY
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1263 - 1273
  • [10] Gao B., 2005, P 11 INT C KNOWLEDGE, P41, DOI DOI 10.1145/1081870.1081879