A Generator of Numerically-Tailored and High-Throughput Accelerators for Batched GEMMs

被引:0
作者
Ledoux, Louis [1 ]
Casas, Marc [1 ]
机构
[1] Univ Politecn Catalunya UPC, Barcelona Supercomp Ctr BSC, Barcelona, Spain
来源
2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022) | 2022年
关键词
STABILITY; DESIGN;
D O I
10.1109/FCCM53951.2022.9786164
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three main novel aspects. First, the accelerators handle a large variety of computer number formats using intermediate representations based on our Sign Scale Significand (S3) format. Second, the processing elements perform all intermediate dot-product arithmetic operations required by the GEMM kernel without any intermediate rounding, which makes it possible to deliver better energy efficiency than state-of-the-art approaches while offering more accuracy and reproducible results. Third, our accelerators feature the Half-Speed Sink Down (HSSD) mechanism, which maximizes the overlap of host-accelerator data transfers with GEMM computations. We evaluate our automatically generated designs in a cutting-edge setup composed of a POWER9 host, CAPI (Coherent Accelerator Processor Interface) link, and a Virtex Ultrascale Plus FPGA. Arrays can operate at the speed of the link and saturate it to reach a 13GB/s throughput. Our fine-grain customization approach allows to cover a wide range of accuracy versus efficiency scenarios and can reach 0.65GOps/s/W while producing 1024 accurate bits or 148.7GOps/s/W with 6 accurate bits. Our configurations achieve up to 1613GOps/s system performance and power efficiencies of up to 240GOps/s/W for the FPGA. This automatic generator is the first being able to produce such a variety of designs. We improve the single-precision energy efficiency of state-of-the-art FPGA GEMM accelerators by 1.86x.
引用
收藏
页码:200 / 209
页数:10
相关论文
共 67 条
  • [1] A. Corporation, 2020, STRATIX DEVICE OVERV, P24
  • [2] Abadi M., 2016, arXiv
  • [3] AlphaData, 2018, ADM PCIE 9V3 SUPPORT, P10
  • [4] ArunkumarM V., 2020, PERC: Posit Enhanced Rocket Chip
  • [5] Aso H., 1988, Systems and Computers in Japan, V19, P14, DOI 10.1002/scj.4690190602
  • [6] Beliakov G., 2013, ARXIV
  • [7] Evaluating the Numerical Stability of Posit Arithmetic
    Buoncristiani, Nicholas
    Shah, Sanjana
    Donofrio, David
    Shalf, John
    [J]. 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 612 - 621
  • [8] Parameterized Posit Arithmetic Hardware Generator
    Chaurasiya, Rohit
    Gustafson, John
    Shrestha, Rahul
    Neudorfer, Jonathan
    Nambiar, Sangeeth
    Niyogi, Kaustav
    Merchant, Farhad
    Leupers, Rainer
    [J]. 2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, : 334 - 341
  • [9] Architectural Enhancements in Intel® Agilex™ FPGAs
    Chromczak, Jeffrey
    Wheeler, Mark
    Chiasson, Charles
    How, Dana
    Langhammer, Martin
    Vanderhoek, Tim
    Zgheib, Grace
    Ganusov, Ilya
    [J]. 2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 140 - 149
  • [10] Collange S., 2014, Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multiand Many-Core Architectures