GAS: General-Purpose In-Memory-Computing Accelerator for Sparse Matrix Multiplication

被引:1
作者
Zhang, Xiaoyu [1 ,2 ]
Li, Zerun [1 ,2 ]
Liu, Rui [1 ,3 ]
Chen, Xiaoming [1 ,2 ]
Han, Yinhe [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100190, Peoples R China
[3] Xiangtan Univ, Sch Mat Sci & Engn, Xiangtan 411105, Hunan, Peoples R China
关键词
Sparse matrices; FeFETs; Computer architecture; Arrays; Nonvolatile memory; Microprocessors; Vectors; Sparse matrix multiplication; in-memory computing; SpMV; SpMSpV; SpMM; SpMSpM;
D O I
10.1109/TC.2024.3371790
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix multiplication is widely used in various practical applications. Different accelerators have been proposed to speed up sparse matrix-dense vector multiplication (SpMV), sparse matrix-sparse vector multiplication (SpMSpV), sparse matrix-dense matrix multiplication (SpMM), and sparse matrix-sparse matrix multiplication (SpMSpM). The performance of traditional sparse matrix multiplication accelerators is typically bounded by memory access due to the poor data locality and irregular memory access. In-memory computing (IMC) is a promising technique to alleviate the memory bottleneck. Previous IMC studies are mostly focused on accelerating a single sparse matrix multiplication function. In this paper, we propose GAS, a general-purpose IMC accelerator for sparse matrix multiplication. GAS integrates non-volatile memory based content-addressable memory (CAM) arrays and multiply-add computation (MAC) arrays to support sparse matrices represented in the double-precision floating-point format. Using a unified outer product based multiplication methodology, GAS supports the acceleration of SpMV, SpMSpv, SpMM, and SpMSpM. We further propose four optimization techniques to speed up the computation of GAS. GAS achieves significant speedups and energy savings over central processing unit (CPU) and graphics processing unit (GPU) implementations. Compared with state-of-the-art traditional and IMC-based accelerators, GAS not only supports more functions, but also achieves higher performance and energy efficiency.
引用
收藏
页码:1427 / 1441
页数:15
相关论文
共 43 条
  • [1] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [2] EXPOSING FINE-GRAINED PARALLELISM IN ALGEBRAIC MULTIGRID METHODS
    Bell, Nathan
    Dalton, Steven
    Olson, Luke N.
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) : C123 - C152
  • [3] Buluc Aydin, 2011, P 2011 INT C HIGH PE, DOI 10.1145/2063384.2063471
  • [4] GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures
    Challapalle, Nagadastagiri
    Rampalli, Sahithi
    Song, Linghao
    Chandramoorthy, Nandhini
    Swaminathan, Karthik
    Sampson, John
    Chen, Yiran
    Narayanan, Vijaykrishnan
    [J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 433 - 445
  • [5] Chen XM, 2022, ASIA S PACIF DES AUT, P339, DOI 10.1109/ASP-DAC52403.2022.9712568
  • [6] Chen XM, 2018, DES AUT TEST EUROPE, P1205, DOI 10.23919/DATE.2018.8342199
  • [7] Chen YS, 2009, INT EL DEVICES MEET, P95
  • [8] TIME:A Training-in-memory Architecture for Memristor-based Deep Neural Networks
    Cheng, Ming
    Xia, Lixue
    Zhu, Zhenhua
    Cai, Yi
    Xie, Yuan
    Wang, Yu
    Yang, Huazhong
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [9] The University of Florida Sparse Matrix Collection
    Davis, Timothy A.
    Hu, Yifan
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01):
  • [10] Enabling Scientific Computing on Memristive Accelerators
    Feinberg, Ben
    Vengalam, Uday Kumar Reddy
    Whitehair, Nathan
    Wang, Shibo
    Ipek, Engin
    [J]. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 367 - 382