A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning

被引:10
作者
Talati, Nishil [1 ]
Jin, Di [1 ]
Yet, Haojie [1 ]
Brahmakshatriya, Ajay [2 ]
Dasika, Ganesh [3 ]
Amarasinghe, Saman [2 ]
Mudge, Trevor [1 ]
Koutra, Danai [1 ]
Dreslinski, Ronald [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] MIT, Cambridge, MA USA
[3] Adv Micro Devices Inc, Santa Clara, CA USA
来源
2021 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2021) | 2021年
基金
美国国家科学基金会;
关键词
Characterization; CPU; dynamic graph; graph learning; GPU; random walk; temporal graph; word2vec; ALGORITHMS; ANALYTICS; COMPILER;
D O I
10.1109/IISWC53511.2021.00019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning on graph data has gained significant interest because of its applicability to various domains ranging from product recommendations to drug discovery. While there is a rapid growth in the algorithmic community, the computer architecture community has so far focused on a subset of graph learning algorithms including Graph Convolution Network (GCN), and a few others. In this paper, we study another, more scalable, graph learning algorithm based on random walks, which operates on dynamic input graphs and has attracted less attention in the architecture community compared to GCN. We propose high-performance CPU and GPU implementations of two important graph learning tasks, that cover a broad class of applications, using random walks on continuous-time dynamic graphs: link prediction and node classification. We show that the resulting workload exhibits distinct characteristics, measured in terms of irregularity, core and memory utilization, and cache hit rates, compared to graph traversals, deep learning, and GCN. We further conduct an in-depth performance analysis focused on both algorithm and hardware to guide future software optimization and architecture exploration. The algorithm-focused study presents a rich trade-off space between algorithmic performance and runtime complexity to identify optimization opportunities. We find an optimal hyperparameter setting that strikes balance in this trade-off space. Using this setting, we also perform a detailed microarchitectural characterization to analyze hardware behavior of these applications and uncover execution bottlenecks, which include high cache misses and dependency-related stalls. The outcome of our study includes recommendations for further performance optimization, and open-source implementations for future investigation.
引用
收藏
页码:87 / 100
页数:14
相关论文
共 97 条
[1]   EmptyHeaded: A Relational Engine for Graph Processing [J].
Aberger, Christopher R. ;
Lamb, Andrew ;
Tu, Susan ;
Noetzli, Andres ;
Olukotun, Kunle ;
Re, Christopher .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (04)
[2]  
Aggarwal C.C., 2010, P SIAM INT C DATA MI, P478
[3]   Evolutionary Network Analysis: A Survey [J].
Aggarwal, Charu ;
Subbian, Karthik .
ACM COMPUTING SURVEYS, 2014, 47 (01)
[4]  
Ahmed NK, 2019, P DLG KDD, P1, DOI DOI 10.1145/3287098
[5]  
Ainsworth Sam, 2016, P 2016 INT C SUP ICS, DOI [DOI 10.1145/2925426.2926254, 10.1145/ 2925426.2926254]
[6]  
Backstrom L., 2011, P 4 ACM INT C WEB SE, P635
[7]  
Baruah T., 2021, ISPASS
[8]   Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads [J].
Basak, Abanti ;
Li, Shuangchen ;
Hu, Xing ;
Oh, Sang Min ;
Xie, Xinfeng ;
Zhao, Li ;
Jiang, Xiaowei ;
Xie, Yuan .
2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, :373-386
[9]  
Beamer Scott, 2015, ARXIV150803619CSDC
[10]  
Brahmakshatriya A., 2021, CGO