An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures

被引：0

作者：

Wang, Runze ^{[1
,2
,3
]}

Hu, Ao ^{[1
,2
,3
]}

Zheng, Long ^{[1
,2
,3
]}

Wang, Qinggang ^{[1
,2
,3
]}

Yuan, Jingrui ^{[1
,2
,3
]}

Liu, Haifeng ^{[1
,2
,3
]}

Yu, Linchen ^{[4
]}

Liao, Xiaofei ^{[1
,2
]}

Jin, Hai ^{[1
,2
]}

机构：

[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[3] Graph Proc Res Ctr, Zhejiang Lab, Hangzhou 311121, Peoples R China

[4] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 05期

关键词：

3D-stacked memory; accelerators; graph convolutional networks (GCNs); processing-in-memory (PIM);

D O I：

10.1109/TCAD.2023.3341753

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph convolutional networks (GCNs) hold great promise in facilitating machine learning on graph-structured data. However, the sparsity of graphs often results in a significant number of irregular memory accesses, leading to inefficient data movement for existing GCNs accelerators. With the advancement of 3D-stacked technology, the processing-in-memory (PIM) architecture has emerged as a promising solution for graph processing. Nevertheless, existing PIM accelerators are confronted with the challenges of irregular remote access in the aggregation phase of GCNs and dynamic workload variations between phases. In this article, we present GCNim, a PIM accelerator based on 3D-stacked memory, which features two key innovations in terms of the computation model and hardware designs. First, we present a PIM-based hybrid computation model, which employs a remote merging strategy to achieve the outer product in aggregation and the row-wise product in combination. Second, GCNim builds a three-stage aggregation and combination pipeline and integrates unified processing elements (PEs) supporting these three stages at the bank level, achieving load balance among PEs through a lightweight data placement algorithm. Compared with the state-of-the-art software frameworks running on CPUs and GPUs, GCNim achieves an average speedup of 3,736.06x and 76.56x , respectively. Moreover, GCNim outperforms the state-of-the-art GCN hardware accelerators, I-GCN, PEDAL, FlowGNN, and GCIM, with average speedups of 3.35x , 8.97x , 2.24x , and 5.58x , respectively.

引用

页码：1360 / 1373

页数：14

共 50 条

[1] NeuroPIM: Felxible Neural Accelerator for Processing-in-Memory Architectures
Bidgoli, Ali Monavari
Fattahi, Sepideh
Rezaei, Seyyed Hossein Seyyedaghaei
Modarressi, Mehdi
Daneshtalab, Masoud
2023 26TH INTERNATIONAL SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS AND SYSTEMS, DDECS, 2023, : 51 - 56
[2] 3D-Stacked memory architectures for multi-core processors
Loh, Gabriel H.
ISCA 2008 PROCEEDINGS: 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2008, : 453 - 464
[3] GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory
Chen, Jiaxian
Lin, Yiquan
Sun, Kaoyi
Chen, Jiexin
Ma, Chenlin
Mao, Rui
Wang, Yi
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 3579 - 3590
[4] Data Reorganization in Memory Using 3D-stacked DRAM
Akin, Berkin
Franchetti, Franz
Hoe, James C.
2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 131 - 143
[5] ApproxPIM: Exploiting Realistic 3D-stacked DRAM for Energy-Efficient Processing In-memory
Tang, Yibin
Wang, Ying
Li, Huawei
Li, Xiaowei
2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 396 - 401
[6] MAC: Memory Access Coalescer for 3D-Stacked Memory
Wang, Xi
Tumeo, Antonino
Leidel, John D.
Li, Jie
Chen, Yong
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
[7] Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and Optimizations
Iskandar, Veronia
Abd El Ghany, Mohamed A.
Goehringer, Diana
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (01)
[8] Towards Near-Data Processing of Compare Operations in 3D-Stacked Memory
Das, Palash
Kapoor, Hemangee K.
PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 243 - 248
[9] A 3D-Stacked Logic-in-Memory Accelerator for Application-Specific Data Intensive Computing
Zhu, Qiuling
Akin, Berkin
Sumbul, H. Ekin
Sadi, Fazle
Hoe, James C.
Pileggi, Larry
Franchetti, Franz
2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC), 2013,
[10] Design space exploration for PIM architectures in 3D-stacked memories
de Lima, Joao Paulo C.
Santos, Paulo Cesar
Alves, Marco A. Z.
Beck, Antonio C. S.
Carro, Luigi
2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 113 - 120

← 1 2 3 4 5 →