SCV-GNN: Sparse Compressed Vector-Based Graph Neural Network Aggregation

被引:1
作者
Unnikrishnan, Nanda K. [1 ]
Gould, Joe [1 ]
Parhi, Keshab K. [1 ]
机构
[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
Sparse matrices; Graph neural networks; Hardware; Memory management; Indexes; Vector processors; Accelerator architectures; Neural networks; aggregation; graph neural networks (GNNs); neural network inference; ACCELERATOR;
D O I
10.1109/TCAD.2023.3291672
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph neural networks (GNNs) have emerged as a powerful tool to process graph-based data in fields like communication networks, molecular interactions, chemistry, social networks, and neuroscience. GNNs are characterized by the ultrasparse nature of their adjacency matrix that necessitates the development of dedicated hardware beyond general-purpose sparse matrix multipliers. While there has been extensive research on designing dedicated hardware accelerators for GNNs, few have extensively explored the impact of the sparse storage format on the efficiency of the GNN accelerators. This article proposes SCV-GNN with the novel sparse compressed vectors (SCVs) format optimized for the aggregation operation. We use $Z$ -Morton ordering to derive a data-locality-based computation ordering and partitioning scheme. This article also presents how the proposed SCV-GNN is scalable on a vector processing system. Experimental results over various datasets show that the proposed method achieves a geometric mean speedup of $7.96\times $ and $7.04\times $ over compressed sparse column (CSC) and compressed sparse row (CSR) aggregation operations, respectively. The proposed method also reduces the memory traffic by a factor of $3.29\times $ and $4.37\times $ over CSC and CSR, respectively. Thus, the proposed novel aggregation format reduces the latency and memory access for GNN inference.
引用
收藏
页码:4803 / 4816
页数:14
相关论文
共 69 条
  • [1] Computing Graph Neural Networks: A Survey from Algorithms to Accelerators
    Abadal, Sergi
    Jain, Akshay
    Guirado, Robert
    Lopez-Alonso, Jorge
    Alarcon, Eduard
    [J]. ACM COMPUTING SURVEYS, 2022, 54 (09)
  • [2] Abi-Karam S, 2022, Arxiv, DOI arXiv:2201.08475
  • [3] [Anonymous], 2020, NVIDIA A100 Tensor Core GPU
  • [4] [Anonymous], 2022, Synopsys
  • [5] Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis
    Arai, Junya
    Shiokawa, Hiroaki
    Yamamuro, Takeshi
    Onizuka, Makoto
    Iwamura, Sotetsu
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 22 - 31
  • [6] Hardware Acceleration of Graph Neural Networks
    Auten, Adam
    Tomei, Matthew
    Kumar, Rakesh
    [J]. PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [7] A Template-Based Design Methodology for Graph-Parallel Hardware Accelerators
    Ayupov, Andrey
    Yesil, Serif
    Ozdal, Muhammet Mustafa
    Kim, Taemin
    Burns, Steven
    Ozturk, Ozcan
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (02) : 420 - 430
  • [8] Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233
  • [9] Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training
    Chen, Xiaobing
    Wang, Yuke
    Xie, Xinfeng
    Hu, Xing
    Basak, Abanti
    Liang, Ling
    Yan, Mingyu
    Deng, Lei
    Ding, Yufei
    Du, Zidong
    Xie, Yuan
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (04) : 936 - 949
  • [10] Accelerating Graph-Connected Component Computation With Emerging Processing-In-Memory Architecture
    Chen, Xuhang
    Wang, Xueyan
    Jia, Xiaotao
    Yang, Jianlei
    Qu, Gang
    Zhao, Weisheng
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5333 - 5342