A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning

被引：10

作者：

Talati, Nishil ^{[1
]}

Jin, Di ^{[1
]}

Yet, Haojie ^{[1
]}

Brahmakshatriya, Ajay ^{[2
]}

Dasika, Ganesh ^{[3
]}

Amarasinghe, Saman ^{[2
]}

Mudge, Trevor ^{[1
]}

Koutra, Danai ^{[1
]}

Dreslinski, Ronald ^{[1
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] MIT, Cambridge, MA USA

[3] Adv Micro Devices Inc, Santa Clara, CA USA

来源：

2021 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2021) | 2021年

基金：

美国国家科学基金会;

关键词：

Characterization; CPU; dynamic graph; graph learning; GPU; random walk; temporal graph; word2vec; ALGORITHMS; ANALYTICS; COMPILER;

D O I：

10.1109/IISWC53511.2021.00019

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine learning on graph data has gained significant interest because of its applicability to various domains ranging from product recommendations to drug discovery. While there is a rapid growth in the algorithmic community, the computer architecture community has so far focused on a subset of graph learning algorithms including Graph Convolution Network (GCN), and a few others. In this paper, we study another, more scalable, graph learning algorithm based on random walks, which operates on dynamic input graphs and has attracted less attention in the architecture community compared to GCN. We propose high-performance CPU and GPU implementations of two important graph learning tasks, that cover a broad class of applications, using random walks on continuous-time dynamic graphs: link prediction and node classification. We show that the resulting workload exhibits distinct characteristics, measured in terms of irregularity, core and memory utilization, and cache hit rates, compared to graph traversals, deep learning, and GCN. We further conduct an in-depth performance analysis focused on both algorithm and hardware to guide future software optimization and architecture exploration. The algorithm-focused study presents a rich trade-off space between algorithmic performance and runtime complexity to identify optimization opportunities. We find an optimal hyperparameter setting that strikes balance in this trade-off space. Using this setting, we also perform a detailed microarchitectural characterization to analyze hardware behavior of these applications and uncover execution bottlenecks, which include high cache misses and dependency-related stalls. The outcome of our study includes recommendations for further performance optimization, and open-source implementations for future investigation.

引用

页码：87 / 100

页数：14

共 97 条

[31]

Hu Wei, 2020, INT C LEARN REPR

[32] Densely Connected Convolutional Networks [J].

Huang, Gao ;

Liu, Zhuang ;

van der Maaten, Laurens ;

Weinberger, Kilian Q. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269

[33] Lethality and centrality in protein networks [J].

Jeong, H ;

Mason, SP ;

Barabási, AL ;

Oltvai, ZN .

NATURE, 2001, 411 (6833) :41-42

[34]

Jin D, 2019, JOINT EUR C MACH LEA

[35] In-Datacenter Performance Analysis of a Tensor Processing Unit [J].

Jouppi, Norman P. ;

Young, Cliff ;

Patil, Nishant ;

Patterson, David ;

Agrawal, Gaurav ;

Bajwa, Raminder ;

Bates, Sarah ;

Bhatia, Suresh ;

Boden, Nan ;

Borchers, Al ;

Boyle, Rick ;

Cantin, Pierre-luc ;

Chao, Clifford ;

Clark, Chris ;

Coriell, Jeremy ;

Daley, Mike ;

Dau, Matt ;

Dean, Jeffrey ;

Gelb, Ben ;

Ghaemmaghami, Tara Vazir ;

Gottipati, Rajendra ;

Gulland, William ;

Hagmann, Robert ;

Ho, C. Richard ;

Hogberg, Doug ;

Hu, John ;

Hundt, Robert ;

Hurt, Dan ;

Ibarz, Julian ;

Jaffey, Aaron ;

Jaworski, Alek ;

Kaplan, Alexander ;

Khaitan, Harshit ;

Killebrew, Daniel ;

Koch, Andy ;

Kumar, Naveen ;

Lacy, Steve ;

Laudon, James ;

Law, James ;

Le, Diemthu ;

Leary, Chris ;

Liu, Zhuyuan ;

Lucke, Kyle ;

Lundin, Alan ;

MacKean, Gordon ;

Maggiore, Adriana ;

Mahony, Maire ;

Miller, Kieran ;

Nagarajan, Rahul ;

Narayanaswami, Ravi .

44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :1-12

[36] Scalable SIMD-Efficient Graph Processing on GPUs [J].

Khorasani, Farzad ;

Gupta, Rajiv ;

Bhuyan, Laxmi N. .

2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, :39-50

[37]

Kipf TN, 2016, ARXIV

[38] Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks [J].

Kumar, Srijan ;

Zhang, Xikun ;

Leskovec, Jure .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :1269-1278

[39]

Le-Phuoc D., 2012, INT SEM WEB C

[40]

Leskovec J, 2010, AAAI C WEB SOC MED

← 1 2 3 4 5 6 7 8 9 10 →