Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network

被引:2
作者
Huang, Linyong [1 ]
Zhang, Zhe [2 ]
Li, Shuangchen [2 ]
Niu, Dimin [2 ]
Guan, Yijin [2 ]
Zheng, Hongzhong [2 ]
Xie, Yuan [2 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310058, Peoples R China
[2] Alibaba Grp, Hangzhou 311121, Peoples R China
关键词
Graph neural network; large-scale graph processing; memory pool; near data processing;
D O I
10.1109/ACCESS.2022.3169423
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph Neural Networks have drawn tremendous attention in the past few years due to their convincing performance and high interpretability in various graph-based tasks like link prediction and node classification. With the ever-growing graph size in the real world, especially for industrial graphs at a billion-level, the storage of graphs can easily consume Terabytes so that the process of GNNs has to be processed in a distributed manner. As a result, the execution could be inefficient due to the expensive cross-node communication and irregular memory access. Various GNN accelerators have been proposed for efficient GNN processing. They, however, mainly focused on small and medium-size graphs, which is not applicable to large-scale distributed graphs. In this paper, we present a practical Near-Data-Processing architecture based on a memory-pool system for large-scale distributed GNNs. We propose a customized memory fabric interface to construct the memory pool for low-latency and high throughput cross-node communication, which can provide flexible memory allocation and strong scalability. A practical Near-Data-Processing design is proposed for efficient work offloading and bandwidth utilization improvement. Moreover, we also introduce a partition and scheduling scheme to further improve performance and achieve workload balance. Comprehensive evaluations demonstrate that the proposed architecture can achieve up to 27 x and 8 x higher training speed compared to two state-of-the-art distributed GNN frameworks: Deep Graph Library and P-3, respectively.
引用
收藏
页码:46796 / 46807
页数:12
相关论文
共 55 条
[1]  
[Anonymous], SAMSUNG UNVEILS IND
[2]  
ccixconsortium, CACHE COHERENT INTER
[3]  
Chen X., 2020, ARXIV200912495
[4]   FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining [J].
Chen, Xuhao ;
Huang, Tianhao ;
Xu, Shuotao ;
Bourgeat, Thomas ;
Chung, Chanwoo ;
Arvind .
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :581-594
[5]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[6]   DianNao Family: Energy-Efficient Hardware Accelerators for Machine Learning [J].
Chen, Yunji ;
Chen, Tianshi ;
Xu, Zhiwei ;
Sun, Ninghui ;
Temam, Olivier .
COMMUNICATIONS OF THE ACM, 2016, 59 (11) :105-112
[7]   DaDianNao: A Machine-Learning Supercomputer [J].
Chen, Yunji ;
Luo, Tao ;
Liu, Shaoli ;
Zhang, Shijin ;
He, Liqiang ;
Wang, Jia ;
Li, Ling ;
Chen, Tianshi ;
Xu, Zhiwei ;
Sun, Ninghui ;
Temam, Olivier .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :609-622
[8]   Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks [J].
Chiang, Wei-Lin ;
Liu, Xuanqing ;
Si, Si ;
Li, Yang ;
Bengio, Samy ;
Hsieh, Cho-Jui .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :257-266
[9]   One Trillion Edges: Graph Processing at Facebook-Scale [J].
Ching, Avery ;
Edunov, Sergey ;
Kabiljo, Maja ;
Logothetis, Dionysios ;
Muthukrishnan, Sambavi .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12) :1804-1815
[10]  
computeexpresslink, COMPUTED EXPERSS LIN