Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

被引:49
作者
Cong, Weilin [1 ]
Forsati, Rana [2 ]
Kandemir, Mahmut [1 ]
Mahdavi, Mehrdad [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Microsoft Bing, Bellevue, WA USA
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
Graph neural networks; minimal variance sampling;
D O I
10.1145/3394486.3403192
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into embedding approximation variance in the forward stage and stochastic gradient variance in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
引用
收藏
页码:1393 / 1403
页数:11
相关论文
共 33 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2018, ARXIV180300942
[3]  
Berg R. v. d., 2017, ARXIV170602263
[4]  
Chen J, 2017, ARXIV PREPRINT ARXIV
[5]   How Do the Open Source Communities Address Usability and UX Issues? An Exploratory Study [J].
Cheng, Jinghui ;
Guo, Jin L. C. .
CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,
[6]   Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks [J].
Chiang, Wei-Lin ;
Liu, Xuanqing ;
Si, Si ;
Li, Yang ;
Bengio, Samy ;
Hsieh, Cho-Jui .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :257-266
[7]  
Csiba D, 2015, PR MACH LEARN RES, V37, P674
[8]   Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting [J].
Cui, Zhiyong ;
Henrickson, Kristian ;
Ke, Ruimin ;
Wang, Yinhai .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (11) :4883-4894
[9]   Learning Dynamic Context Graphs for Predicting Social Events [J].
Deng, Songgaojun ;
Rangwala, Huzefa ;
Ning, Yue .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :1007-1016
[10]   Graph Transformation Policy Network for Chemical Reaction Prediction [J].
Do, Kien ;
Truyen Tran ;
Venkatesh, Svetha .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :750-760