HeNCoG: A Heterogeneous Near-memory Computing Architecture for Energy Efficient GCN Acceleration

被引：0

作者：

Hwang, Seung-Eon ^{[1
]}

Song, Duyeong ^{[1
]}

Park, Jongsun ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul, South Korea

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

基金：

新加坡国家研究基金会;

关键词：

Graph Convolutional Network; Sparse Matrix Multiplication; Near-memory Computing; Domain Specific Accelerator; PERFORMANCE;

D O I：

10.1109/ISCAS58744.2024.10558133

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Graph convolutional network (GCN), which first applies convolutional operations to process graph data, has gained attention in various tasks involving relational data. Previous GCN accelerators have been designed with heterogeneous cores, considering two stages of inference (aggregation and combination), or with a unified core based on the inference of multi layer as an iterative sparse-dense matrix multiplication. However, those prior works have suffered from an unnecessary large number of multiply-accumulate (MAC) operations and/or main memory accesses. In this paper, we propose HeNCoG, a GCN accelerator that utilizes a heterogeneous MAC array core for the combination stage and a near-memory computing core for the aggregation stage. In HeNCoG, considering that the number of MAC operations is significantly reduced when changing the stage execution order, the combination stage is executed first with a row-stationary dataflow. In the aggregation stage, magneto-resistive random-access memory (MRAM)-based near-memory computing is employed to reduce the number of main memory accesses needed to access the adjacency matrix in the graph dataset. Graph partitioning and double buffering techniques are also applied to further improve hardware efficiencies. Simulation results show that the HeNCoG architecture reduces execution cycles by 97% and memory accesses by 42% compared to previous works.

引用

页数：5

共 16 条

[1] A 1-Mb 28-nm 1T1MTJ STT-MRAM With Single-Cap Offset-Cancelled Sense Amplifier and In Situ Self-Write-Termination
Dong, Qing
Wang, Zhehong
Lim, Jongyup
Zhang, Yiqun
Sinangil, Mahmut E.
Shih, Yi-Chun
Chih, Yu-Der
Chang, Jonathan
Blaauw, David
Sylvester, Dennis
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 231 - 239
[2] NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory
Dong, Xiangyu
Xu, Cong
Xie, Yuan
Jouppi, Norman P.
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (07) : 994 - 1007
[3] Fey M., 2020, ARXIV
[4] Gustavson F. G., 1978, ACM Transactions on Mathematical Software, V4, P250, DOI 10.1145/355791.355796
[5] Hamilton W. L., 2017, ADVANCEMENTS NEURAL
[6] Hwang R., 2023, IEEE INT S HIGH PERF
[7] KaHIP, 2023, KAHIP
[8] Kipf T. N., 2017, P ICLR
[9] Li J., 2021, IEEE INT S HIGH PERF
[10] OConnor M., 2014, MEM FOR WORKSH

← 1 2 →