A Cascaded ReRAM-based Crossbar Architecture for Transformer Neural Network Acceleration

被引：0

作者：

Xu, Jiahong ^{[1
]}

Liu, Haikun ^{[1
]}

Peng, Xiaoyang ^{[1
]}

Duan, Zhuohui ^{[1
]}

Liao, Xiaofei ^{[1
]}

Jin, Hai ^{[1
]}

机构：

[1] Huazhong University of Science and Technology, Hubei, Wuhan

来源：

ACM Transactions on Design Automation of Electronic Systems | 2024年 / 30卷 / 01期

基金：

中国国家自然科学基金;

关键词：

analog-to-digital conversion; PIM; ReRAM; Transformer;

D O I：

10.1145/3701034

中图分类号：

学科分类号：

摘要：

Emerging resistive random-access memory (ReRAM) based processing-in-memory (PIM) accelerators have been increasingly explored in recent years because they can efficiently perform in-situ matrix-vector multiplication (MVM) operations involved in a wide spectrum of artificial neural networks. However, there remain significant challenges to apply existing ReRAM-based PIM accelerators to the most popular Transformer neural networks. Since Transformers involve a series of matrix-matrix multiplication (MatMul) operations with data dependencies, they should write intermediate results of MatMuls to ReRAM crossbar arrays for further processing. Conventional ReRAM-based PIM accelerators often suffer from high latency of ReRAM writes and intra-layer pipeline stalls.In this paper, we propose ReCAT, a ReRAM-based PIM accelerator designed particularly for Transformers. ReCAT exploits transimpedance amplifiers (TIAs) to cascade a pair of crossbar arrays for MatMul operations involved in the self-attention mechanism. The intermediate result of a MatMul generated by one crossbar array can be directly mapped to another crossbar array, avoiding costly analog-to-digital conversions. In this way, ReCAT allows MVM operations to overlap with the corresponding data mapping, hiding the high latency of ReRAM writes. Furthermore, we propose an analog-to-digital converter (ADC) virtualization scheme to dynamically share scarce ADCs among a group of crossbar arrays, and thus significantly improve the utilization of ADCs to eliminate the performance bottleneck of MVM operations. Experimental results show that ReCAT achieves 207.3×, 2.11×, and 3.06× performance improvement on average compared with other Transformer acceleration solutions - GPUs, ReBert, and ReTransformer, respectively. © 2024 Copyright held by the owner/author(s).

引用

共 53 条

[51] Yazdanbakhsh A., Moradifirouzabadi A., Li Z., Kang M., Sparse attention acceleration with synergistic in-memory pruning and on-chip recomputation, Proceedings of 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 744-762, (2022)
[52] Serdar Yonar A., Andrea Francese P., Brandli M., Kossel M., Prathapan M., Morf T., Ruffino A., Jang T., An 8b 1.0-to-1.25GS/s 0.7-to-0.8V single-stage time-based gated-ringoscillator ADC with 2× interpolating sense-amplifier-latches, Proceedings of the 2023 IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, (2023)
[53] Yuan G., Behnam P., Li Z., Shafiee A., Lin S., Ma X., Liu H., Qian X., Nazm Bojnordi M., Wang Y., Ding C., FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator, Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 265-278, (2021)

← 1 2 3 4 5 6 →