Carat: Unlocking Value-Level Parallelism for Multiplier-Free GEMMs

被引:0
作者
Pan, Zhewen [1 ]
Miguel, Joshua San [1 ]
Wu, Di [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Univ Cent Florida, Orlando, FL 32816 USA
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2 | 2024年
关键词
value-level parallelism; value reuse; temporal computing; low-precision; batch processing; multiplier-free;
D O I
10.1145/3620665.3640364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, hardware architectures optimized for general matrix multiplication (GEMM) have been well studied to deliver better performance and e.ciency for deep neural networks. With trends towards batched, low-precision data, e.g., FP8 format in this work, we observe that there is growing untapped potential for value reuse. We propose a novel computing paradigm, value-level parallelism, whereby unique products are computed only once, and di.erent inputs subscribe to (select) their products via temporal coding. Our architecture, Carat, employs value-level parallelism and transforms multiplication into accumulation, performing GEMMs with e.cient multiplier-free hardware. Experiments show that, on average, Carat improves iso-area throughput and energy e.ciency by 1.02. and 1.06. over a systolic array and 3.2. and 4.3. when scaled up to multiple nodes.
引用
收藏
页码:167 / 184
页数:18
相关论文
共 77 条
[1]   A hybrid CNN plus BILSTM deep learning-based DSS for efficient prediction of judicial case decisions [J].
Ahmad, Shakeel ;
Asghar, Muhammad Zubair ;
Alotaibi, Fahad Mazaed ;
Al-Otaibi, Yasser D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
[2]  
Akita Ryo, 2016, INT C COM INF SCI
[3]  
[Anonymous], 2022, Cross-Industry Hardware Speci.cation to Accelerate AI Software Development
[4]  
Arm, 2022, Arm supports FP8: A new 8-bit.oating-point interchange format for Neural Network processing
[5]  
Azad M.M., 2021, ANN ROMANIAN SOC CEL, V25, P5591
[6]  
Bakator Mihalj, 2018, Multimodal Technologies and Interaction, V2, DOI 10.3390/mti2030047
[7]   CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories [J].
Balasubramonian, Rajeev ;
Kahng, Andrew B. ;
Muralimanohar, Naveen ;
Shafiee, Ali ;
Srinivas, Vaishnav .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (02)
[8]  
Chase, 2023, How Often is Your Credit Score Updated?
[9]   A Deep Learning Method for Judicial Decision Support [J].
Chen, Baogui ;
Li, Yu ;
Zhang, Shu ;
Lian, Hao ;
He, Tieke .
2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, :145-149
[10]  
Chen Y, 2016, DESTECH TRANS COMP