CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture

被引：2

作者：

Li, Huize ^{[1
]}

Jin, Hai ^{[1
]}

Zheng, Long ^{[1
]}

Liao, Xiaofei ^{[1
]}

Huang, Yu ^{[1
]}

Liu, Cong ^{[1
]}

Xu, Jiahong ^{[1
]}

Duan, Zhuohui ^{[1
]}

Chen, Dan ^{[1
]}

Gui, Chuangyi ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 06期

关键词：

Sparse matrices; Computer architecture; Matrix converters; Microprocessors; Field programmable gate arrays; Virtual machine monitors; Hardware; Attention mechanism; domain-specific accelerator; processing-in-memory; resistive random access memory (ReRAM);

D O I：

10.1109/TCAD.2023.3344524

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The attention-based neural network attracts great interest due to its excellent accuracy enhancement. However, the attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. To reduce the unnecessary calculations, researchers propose sparse attention to convert some dense-dense matrices multiplication (DDMM) operations to sampled dense-dense matrix multiplication (SDDMM) and sparse matrix multiplication (SpMM) operations. However, current sparse attention solutions introduce massive off-chip random memory access since the sparse attention matrix is generally unstructured. We propose CPSAA, a novel crossbar-based processing-in-memory (PIM)-featured sparse attention accelerator to eliminate off-chip data transmissions. 1) We present a novel attention calculation mode to balance the crossbar writing and crossbar processing latency. 2) We design a novel PIM-based sparsity pruning architecture to eliminate the pruning phase's off-chip data transfers. 3) Finally, we present novel crossbar-based SDDMM and SpMM methods to process unstructured sparse attention matrices by coupling two types of crossbar arrays. Experimental results show that CPSAA has an average of 89.6x , 32.2x , 17.8x , 3.39x , and 3.84x performance improvement and 755.6x , 55.3x , 21.3x , 5.7x , and 4.9x energy-saving when compare with GPU, field programmable gate array, SANGER, ReBERT, and ReTransformer.

引用

页码：1741 / 1754

页数：14

共 42 条

[1] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[2]

Child R, 2019, Arxiv, DOI [arXiv:1904.10509, 10.48550/arXiv.1904.10509]

[3]

Cui BY, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3548

[4]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]

[5] Enabling Scientific Computing on Memristive Accelerators [J].

Feinberg, Ben ;

Vengalam, Uday Kumar Reddy ;

Whitehair, Nathan ;

Wang, Shibo ;

Ipek, Engin .

2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :367-382

[6]

Geigle Gregor, 2021, Tweac: Transformer with extendable qa agent classifiers

[7] A3: Accelerating Attention Mechanisms in Neural Networks with Approximation [J].

Ham, Tae Jun ;

Jung, Sung Jun ;

Kim, Seonghak ;

Oh, Young H. ;

Park, Yeonhong ;

Song, Yoonho ;

Park, Jung-Hun ;

Lee, Sanghee ;

Park, Kyoung ;

Lee, Jae W. ;

Jeong, Deog-Kyoon .

2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, :328-341

[8] A Survey on Vision Transformer [J].

Han, Kai ;

Wang, Yunhe ;

Chen, Hanting ;

Chen, Xinghao ;

Guo, Jianyuan ;

Liu, Zhenhua ;

Tang, Yehui ;

Xiao, An ;

Xu, Chunjing ;

Xu, Yixing ;

Yang, Zhaohui ;

Zhang, Yiman ;

Tao, Dacheng .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) :87-110

[9] Ten Lessons From Three Generations Shaped Google's TPUv4i Industrial Product [J].

Jouppi, Norman P. ;

Yoon, Doe Hyun ;

Ashcraft, Matthew ;

Gottscho, Mark ;

Jablin, Thomas B. ;

Kurian, George ;

Laudon, James ;

Li, Sheng ;

Ma, Peter ;

Ma, Xiaoyu ;

Norrie, Thomas ;

Patil, Nishant ;

Prasad, Sushma ;

Young, Cliff ;

Zhou, Zongwei ;

Patterson, David .

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, :1-14

[10]

Kalyan K.S., 2021, arXiv, DOI DOI 10.48550/ARXIV.2108.05542

← 1 2 3 4 5 →