Memory Efficient Graph Convolutional Network based Distributed Link Prediction

被引：1

作者：

Senevirathne, Damitha ^{[1
]}

Wijesiri, Isuru ^{[1
]}

Dehigaspitiya, Suchitha ^{[1
]}

Dayarathna, Miyuru ^{[1
,2
]}

Jayasena, Sanath ^{[1
]}

Suzumura, Toyotaro ^{[3
,4
,5
]}

机构：

[1] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa, Sri Lanka

[2] WSO2 Inc, Mountain View, CA USA

[3] IBM TJ Watson Res Ctr, New York, NY USA

[4] MIT IBM Watson AI Lab, Cambridge, MA USA

[5] Barcelona Supercomp Ctr, Barcelona, Spain

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年

关键词：

Machine Learning; Graph Databases; Distributed Databases; Graph Theory; Graph Convolutional Neural Networks; GraphSAGE; Deep Learning; Distributed Learning; Link Prediction;

D O I：

10.1109/BigData50022.2020.9377874

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graph Convolutional Networks (GCN) have found multiple applications of graph-based machine learning. However, training GCNs on large graphs of billions of nodes and edges with rich node attributes consume significant amount of time and memory resources. This makes it impossible to train such GCNs on general purpose commodity hardware. Such use cases demand high-end servers with accelerators and ample amounts of memory. In this paper we implement a memory efficient GCN based link prediction on top of a distributed graph database server called JasmineGraph(1). Our approach is based on federated training on partitioned graphs with multiple parallel workers. We conduct experiments with three real world graph datasets called DBLP-V11, Reddit, and Twitter. We demonstrate that our approach produces optimal performance for a given hardware setting. JasmineGraph was able to train a GCN on the largest dataset DBLP-V11(>10GB) in 20 hours and 24 minutes for 5 training rounds and 3 epochs by partitioning it into 16 partitions with 2 workers on a single server while the conventional training method could not process it at all due to lack of memory. The second largest dataset Reddit took 9 hours 8 minutes to train with conventional training while JasmineGraph took only 3 hours and 11 minutes with 8 partitions-4 workers in the same hardware giving 3 times improved performance. In case of Twitter dataset JasmineGraph was able to give 5 times improved performance. (10 hours 31 minutes vs 2 hours 6 minutes;16 partitions-16 workers).

引用

页码：2977 / 2986

页数：10

共 27 条

[1] [Anonymous], 2018, STELLARGRAPH MACHINE
[2] [Anonymous], 2017, CORR
[3] Bruna J., 2014, P INT C LEARN REPR I
[4] Chami I., 2020, MACHINE LEARNING GRA
[5] Dayarathna M., 2017, HIGH PERFORMANCE GRA, P173
[6] Dayarathna M, 2017, INT C HIGH PERFORM, P243, DOI 10.1109/HiPC.2017.00036
[7] Dayarathna M, 2016, IEEE INT CONF CLOUD, P521, DOI [10.1109/CLOUD.2016.0075, 10.1109/CLOUD.2016.73]
[8] Towards Scalable Distributed Graph Database Engine for Hybrid Clouds
Dayarathna, Miyuru
Suzumura, Toyotaro
[J]. 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, : 1 - 8
[9] Dayarathna M, 2012, INT C HIGH PERFORM
[10] Gori M, 2005, IEEE IJCNN, P729

← 1 2 3 →