H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

被引:4
作者
Luo, Yandong [1 ,2 ,3 ]
Yu, Shimeng [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, 791 Atlantic Dr NW, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Atlanta, GA USA
[3] Apple, Cupertino, CA 95014 USA
关键词
Compute-in-memory; DNN accelerator; heterogeneous 3D integration; multi-head self-attention; transformer; MEMORY SRAM MACRO;
D O I
10.1145/3649219
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Prior hardware accelerator designs primarily focused on single-chip solutions for 10 MB-class computer vi-sion models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for trans-former models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advan-tage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the BERT and GPT2 model, which is about 2.6 x similar to 3.1 x higher than the baseline with 7 nm TPU and stacked FeFET memory.
引用
收藏
页数:20
相关论文
共 42 条
  • [31] Tsai CH, 2020, S VLSI TECH, DOI 10.1109/VLSITechnology18217.2020.9265044
  • [32] Tu F., 2022, ISSCC, P466
  • [33] Vaswani A, 2017, ADV NEUR IN, V30
  • [34] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
    Wang, Hanrui
    Zhang, Zhekai
    Han, Song
    [J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 97 - 110
  • [35] EPSTO-ARIMA: Electric Power Stochastic Optimization Predicting Based on ARIMA
    Xu, Yuqing
    Xu, Guangxia
    An, Zeliang
    Liu, Yanbin
    [J]. 2021 IEEE 9TH INTERNATIONAL CONFERENCE ON SMART CITY AND INFORMATIZATION (ISCI 2021), 2021, : 70 - 75
  • [36] Modeling of interconnect capacitance, delay, and crosstalk in VLSI
    Wong, SC
    Lee, GY
    Ma, DJ
    [J]. IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2000, 13 (01) : 108 - 111
  • [37] Wu P.-C., 2023, 2023 IEEE INT SOLID, P126
  • [38] Wu YH, 2016, Arxiv, DOI arXiv:1609.08144
  • [39] RETRANSFORMER: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
    Yang, Xiaoxuan
    Yan, Bonan
    Li, Hai
    Chen, Yiran
    [J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [40] XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks
    Yin, Shihui
    Jiang, Zhewei
    Seo, Jae-Sun
    Seok, Mingoo
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (06) : 1733 - 1743