An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures

被引:0
|
作者
Wang, Runze [1 ,2 ,3 ]
Hu, Ao [1 ,2 ,3 ]
Zheng, Long [1 ,2 ,3 ]
Wang, Qinggang [1 ,2 ,3 ]
Yuan, Jingrui [1 ,2 ,3 ]
Liu, Haifeng [1 ,2 ,3 ]
Yu, Linchen [4 ]
Liao, Xiaofei [1 ,2 ]
Jin, Hai [1 ,2 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[3] Graph Proc Res Ctr, Zhejiang Lab, Hangzhou 311121, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
关键词
3D-stacked memory; accelerators; graph convolutional networks (GCNs); processing-in-memory (PIM);
D O I
10.1109/TCAD.2023.3341753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph convolutional networks (GCNs) hold great promise in facilitating machine learning on graph-structured data. However, the sparsity of graphs often results in a significant number of irregular memory accesses, leading to inefficient data movement for existing GCNs accelerators. With the advancement of 3D-stacked technology, the processing-in-memory (PIM) architecture has emerged as a promising solution for graph processing. Nevertheless, existing PIM accelerators are confronted with the challenges of irregular remote access in the aggregation phase of GCNs and dynamic workload variations between phases. In this article, we present GCNim, a PIM accelerator based on 3D-stacked memory, which features two key innovations in terms of the computation model and hardware designs. First, we present a PIM-based hybrid computation model, which employs a remote merging strategy to achieve the outer product in aggregation and the row-wise product in combination. Second, GCNim builds a three-stage aggregation and combination pipeline and integrates unified processing elements (PEs) supporting these three stages at the bank level, achieving load balance among PEs through a lightweight data placement algorithm. Compared with the state-of-the-art software frameworks running on CPUs and GPUs, GCNim achieves an average speedup of 3,736.06x and 76.56x , respectively. Moreover, GCNim outperforms the state-of-the-art GCN hardware accelerators, I-GCN, PEDAL, FlowGNN, and GCIM, with average speedups of 3.35x , 8.97x , 2.24x , and 5.58x , respectively.
引用
收藏
页码:1360 / 1373
页数:14
相关论文
共 50 条
  • [1] NeuroPIM: Felxible Neural Accelerator for Processing-in-Memory Architectures
    Bidgoli, Ali Monavari
    Fattahi, Sepideh
    Rezaei, Seyyed Hossein Seyyedaghaei
    Modarressi, Mehdi
    Daneshtalab, Masoud
    2023 26TH INTERNATIONAL SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS AND SYSTEMS, DDECS, 2023, : 51 - 56
  • [2] 3D-Stacked memory architectures for multi-core processors
    Loh, Gabriel H.
    ISCA 2008 PROCEEDINGS: 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 2008, : 453 - 464
  • [3] GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory
    Chen, Jiaxian
    Lin, Yiquan
    Sun, Kaoyi
    Chen, Jiexin
    Ma, Chenlin
    Mao, Rui
    Wang, Yi
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 3579 - 3590
  • [4] Data Reorganization in Memory Using 3D-stacked DRAM
    Akin, Berkin
    Franchetti, Franz
    Hoe, James C.
    2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 131 - 143
  • [5] ApproxPIM: Exploiting Realistic 3D-stacked DRAM for Energy-Efficient Processing In-memory
    Tang, Yibin
    Wang, Ying
    Li, Huawei
    Li, Xiaowei
    2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 396 - 401
  • [6] MAC: Memory Access Coalescer for 3D-Stacked Memory
    Wang, Xi
    Tumeo, Antonino
    Leidel, John D.
    Li, Jie
    Chen, Yong
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [7] Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and Optimizations
    Iskandar, Veronia
    Abd El Ghany, Mohamed A.
    Goehringer, Diana
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (01)
  • [8] Towards Near-Data Processing of Compare Operations in 3D-Stacked Memory
    Das, Palash
    Kapoor, Hemangee K.
    PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 243 - 248
  • [9] A 3D-Stacked Logic-in-Memory Accelerator for Application-Specific Data Intensive Computing
    Zhu, Qiuling
    Akin, Berkin
    Sumbul, H. Ekin
    Sadi, Fazle
    Hoe, James C.
    Pileggi, Larry
    Franchetti, Franz
    2013 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC), 2013,
  • [10] Design space exploration for PIM architectures in 3D-stacked memories
    de Lima, Joao Paulo C.
    Santos, Paulo Cesar
    Alves, Marco A. Z.
    Beck, Antonio C. S.
    Carro, Luigi
    2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 113 - 120