H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

被引：4

作者：

Luo, Yandong ^{[1
,2
,3
]}

Yu, Shimeng ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, 791 Atlantic Dr NW, Atlanta, GA 30332 USA

[2] Georgia Inst Technol, Atlanta, GA USA

[3] Apple, Cupertino, CA 95014 USA

来源：

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS | 2024年 / 29卷 / 03期

关键词：

Compute-in-memory; DNN accelerator; heterogeneous 3D integration; multi-head self-attention; transformer; MEMORY SRAM MACRO;

D O I：

10.1145/3649219

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Prior hardware accelerator designs primarily focused on single-chip solutions for 10 MB-class computer vi-sion models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for trans-former models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advan-tage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the BERT and GPT2 model, which is about 2.6 x similar to 3.1 x higher than the baseline with 7 nm TPU and stacked FeFET memory.

引用

页数：20

共 42 条

[31] Tsai CH, 2020, S VLSI TECH, DOI 10.1109/VLSITechnology18217.2020.9265044
[32] Tu F., 2022, ISSCC, P466
[33] Vaswani A, 2017, ADV NEUR IN, V30
[34] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Wang, Hanrui
Zhang, Zhekai
Han, Song
[J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 97 - 110
[35] EPSTO-ARIMA: Electric Power Stochastic Optimization Predicting Based on ARIMA
Xu, Yuqing
Xu, Guangxia
An, Zeliang
Liu, Yanbin
[J]. 2021 IEEE 9TH INTERNATIONAL CONFERENCE ON SMART CITY AND INFORMATIZATION (ISCI 2021), 2021, : 70 - 75
[36] Modeling of interconnect capacitance, delay, and crosstalk in VLSI
Wong, SC
Lee, GY
Ma, DJ
[J]. IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2000, 13 (01) : 108 - 111
[37] Wu P.-C., 2023, 2023 IEEE INT SOLID, P126
[38] Wu YH, 2016, Arxiv, DOI arXiv:1609.08144
[39] RETRANSFORMER: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
Yang, Xiaoxuan
Yan, Bonan
Li, Hai
Chen, Yiran
[J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[40] XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks
Yin, Shihui
Jiang, Zhewei
Seo, Jae-Sun
Seok, Mingoo
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (06) : 1733 - 1743

← 1 2 3 4 5 →