New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference

被引:29
|
作者
Zhang, Hao [1 ]
Chen, Dongdong [2 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
[2] Intel Corp, San Jose, CA 95134 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Neural networks; Standards; Deep learning; Training; Hardware; Adders; Pipelines; Multiply-accumulate unit; multiple-precision arithmetic; flexible precision arithmetic; deep neural network computing; computer arithmetic; ADD;
D O I
10.1109/TC.2019.2936192
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more versatile, the bit-width of exponent and mantissa can be flexibly exchanged. By setting the bit-width of exponent to zero, the proposed MAC unit also supports fixed-point operations. For fixed-point format, the proposed unit supports one 16-bit MAC or sum of two 8-bit multiplications plus a 16-bit addend. Moreover, the proposed unit can be further divided to support sum of four 4-bit multiplications plus a 16-bit addend. At the lowest precision, the proposed MAC unit supports accumulating of eight 1-bit logic AND operations to enable the support of binary neural networks. Compared to the standard 16-bit half-precision MAC unit, the proposed MAC unit provides more flexibility with only 21.8 percent area overhead. Compared to a standard 32-bit single-precision MAC unit, the proposed MAC unit requires much less hardware cost but still provides 8-bit exponent in the numerical format to maintain large dynamic range for deep learning computing.
引用
收藏
页码:26 / 38
页数:13
相关论文
共 50 条
  • [31] Training deep neural network on multiple GPUs with a model averaging method
    Yao, Qiongjie
    Liao, Xiaofei
    Jin, Hai
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2018, 11 (05) : 1012 - 1021
  • [32] Shakeout: A New Approach to Regularized Deep Neural Network Training
    Kang, Guoliang
    Li, Jun
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1245 - 1258
  • [33] Shakeout: A New Regularized Deep Neural Network Training Scheme
    Kang, Guoliang
    Li, Jun
    Tao, Dacheng
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1751 - 1757
  • [34] Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
    Vinck, Toon
    Jonckers, Nain
    Dekkers, Gert
    Prinzie, Jeffrey
    Karsmakers, Peter
    JOURNAL OF INSTRUMENTATION, 2025, 20 (02):
  • [35] FlexACC: A Programmable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference
    Yang, En-Yu
    Jia, Tianyu
    Brooks, David
    Wei, Gu-Yeon
    2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 266 - 273
  • [36] Dynamically Adapting Floating-Point Precision to Accelerate Deep Neural Network Training
    Osorio Rios, John
    Armejach, Adria
    Petit, Eric
    Henry, Greg
    Casas, Marc
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 980 - 987
  • [37] EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
    Cavigelli, Lukas
    Rutishauser, Georg
    Benini, Luca
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 723 - 734
  • [38] A Deep Neural Network Training Architecture With Inference-Aware Heterogeneous Data-Type
    Choi, Seungkyu
    Shin, Jaekang
    Kim, Lee-Sup
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (05) : 1216 - 1229
  • [39] Analog CMOS-based Resistive Processing Unit for Deep Neural Network Training
    Kim, Seyoung
    Gokmen, Tayfun
    Lee, Hyung-Min
    Haensch, Wilfried E.
    2017 IEEE 60TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2017, : 422 - 425
  • [40] PCM: Precision-Controlled Memory System for Energy Efficient Deep Neural Network Training
    Kim, Boyeal
    Lee, Sang Hyun
    Kim, Hyun
    Duy-Thanh Nguyen
    Minh-Son Le
    Chang, Ik Joon
    Kwon, Dohun
    Yoo, Jin Hyeok
    Choi, Jun Won
    Lee, Hyuk-Jae
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1199 - 1204