H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

被引：4

作者：

Luo, Yandong ^{[1
,2
,3
]}

Yu, Shimeng ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, 791 Atlantic Dr NW, Atlanta, GA 30332 USA

[2] Georgia Inst Technol, Atlanta, GA USA

[3] Apple, Cupertino, CA 95014 USA

来源：

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS | 2024年 / 29卷 / 03期

关键词：

Compute-in-memory; DNN accelerator; heterogeneous 3D integration; multi-head self-attention; transformer; MEMORY SRAM MACRO;

D O I：

10.1145/3649219

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Prior hardware accelerator designs primarily focused on single-chip solutions for 10 MB-class computer vi-sion models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for trans-former models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advan-tage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the BERT and GPT2 model, which is about 2.6 x similar to 3.1 x higher than the baseline with 7 nm TPU and stacked FeFET memory.

引用

页数：20

共 42 条

[1] BEOL-Compatible Superlattice FEFET Analog Synapse With Improved Linearity and Symmetry of Weight Update
Aabrar, Khandker Akif
Kirtania, Sharadindu Gopal
Liang, Fu-Xiang
Gomez, Jorge
San Jose, Matthew
Luo, Yandong
Ye, Huacheng
Dutta, Sourav
Ravikumar, Priyankka G.
Ravindran, Prasanna Venkatesan
Khan, Asif Islam
Yu, Shimeng
Datta, Suman
[J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2022, 69 (04) : 2094 - 2100
[2] Beyne Eric, 2017, 2017 IEEE International Electron Devices Meeting (IEDM), p32.4.1, DOI 10.1109/IEDM.2017.8268486
[3] System on Integrated Chips (SoICTM) for 3D Heterogeneous Integration
Chen, F. C.
Chen, M. F.
Chiou, W. C.
Yu, Doug C. H.
[J]. 2019 IEEE 69TH ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE (ECTC), 2019, : 594 - 599
[4] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
Chen, Yu-Hsin
Yange, Tien-Ju
Emer, Joel S.
Sze, Vivienne
[J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) : 292 - 308
[5] Deaville Peter, 2022, 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), P268, DOI 10.1109/VLSITechnologyandCir46769.2022.9830153
[6] Derakhshandeh J., 2021, IEEE 71 EL COMP TECH
[7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8] Dong Q, 2020, ISSCC DIG TECH PAP I, P242, DOI [10.1109/isscc19947.2020.9062985, 10.1109/ISSCC19947.2020.9062985]
[9] Dünkel S, 2017, INT EL DEVICES MEET
[10] Golonzka O, 2018, INT EL DEVICES MEET

← 1 2 3 4 5 →