H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

被引:4
作者
Luo, Yandong [1 ,2 ,3 ]
Yu, Shimeng [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, 791 Atlantic Dr NW, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Atlanta, GA USA
[3] Apple, Cupertino, CA 95014 USA
关键词
Compute-in-memory; DNN accelerator; heterogeneous 3D integration; multi-head self-attention; transformer; MEMORY SRAM MACRO;
D O I
10.1145/3649219
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Prior hardware accelerator designs primarily focused on single-chip solutions for 10 MB-class computer vi-sion models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for trans-former models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advan-tage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the BERT and GPT2 model, which is about 2.6 x similar to 3.1 x higher than the baseline with 7 nm TPU and stacked FeFET memory.
引用
收藏
页数:20
相关论文
共 42 条
  • [1] BEOL-Compatible Superlattice FEFET Analog Synapse With Improved Linearity and Symmetry of Weight Update
    Aabrar, Khandker Akif
    Kirtania, Sharadindu Gopal
    Liang, Fu-Xiang
    Gomez, Jorge
    San Jose, Matthew
    Luo, Yandong
    Ye, Huacheng
    Dutta, Sourav
    Ravikumar, Priyankka G.
    Ravindran, Prasanna Venkatesan
    Khan, Asif Islam
    Yu, Shimeng
    Datta, Suman
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2022, 69 (04) : 2094 - 2100
  • [2] Beyne Eric, 2017, 2017 IEEE International Electron Devices Meeting (IEDM), p32.4.1, DOI 10.1109/IEDM.2017.8268486
  • [3] System on Integrated Chips (SoICTM) for 3D Heterogeneous Integration
    Chen, F. C.
    Chen, M. F.
    Chiou, W. C.
    Yu, Doug C. H.
    [J]. 2019 IEEE 69TH ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE (ECTC), 2019, : 594 - 599
  • [4] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
    Chen, Yu-Hsin
    Yange, Tien-Ju
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) : 292 - 308
  • [5] Deaville Peter, 2022, 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), P268, DOI 10.1109/VLSITechnologyandCir46769.2022.9830153
  • [6] Derakhshandeh J., 2021, IEEE 71 EL COMP TECH
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] Dong Q, 2020, ISSCC DIG TECH PAP I, P242, DOI [10.1109/isscc19947.2020.9062985, 10.1109/ISSCC19947.2020.9062985]
  • [9] Dünkel S, 2017, INT EL DEVICES MEET
  • [10] Golonzka O, 2018, INT EL DEVICES MEET