New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference

被引:29
|
作者
Zhang, Hao [1 ]
Chen, Dongdong [2 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
[2] Intel Corp, San Jose, CA 95134 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Neural networks; Standards; Deep learning; Training; Hardware; Adders; Pipelines; Multiply-accumulate unit; multiple-precision arithmetic; flexible precision arithmetic; deep neural network computing; computer arithmetic; ADD;
D O I
10.1109/TC.2019.2936192
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more versatile, the bit-width of exponent and mantissa can be flexibly exchanged. By setting the bit-width of exponent to zero, the proposed MAC unit also supports fixed-point operations. For fixed-point format, the proposed unit supports one 16-bit MAC or sum of two 8-bit multiplications plus a 16-bit addend. Moreover, the proposed unit can be further divided to support sum of four 4-bit multiplications plus a 16-bit addend. At the lowest precision, the proposed MAC unit supports accumulating of eight 1-bit logic AND operations to enable the support of binary neural networks. Compared to the standard 16-bit half-precision MAC unit, the proposed MAC unit provides more flexibility with only 21.8 percent area overhead. Compared to a standard 32-bit single-precision MAC unit, the proposed MAC unit requires much less hardware cost but still provides 8-bit exponent in the numerical format to maintain large dynamic range for deep learning computing.
引用
收藏
页码:26 / 38
页数:13
相关论文
共 50 条
  • [21] PV-MAC: Multiply-and-accumulate unit structure exploiting precision variability in on-device convolutional neural networks
    Kang, Jongsung
    Kim, Taewhan
    INTEGRATION-THE VLSI JOURNAL, 2020, 71 : 76 - 85
  • [22] Efficient spiking neural network training and inference with reduced precision memory and computing
    Wang, Yi
    Shahbazi, Karim
    Zhang, Hao
    Oh, Kwang-Il
    Lee, Jae-Jin
    Ko, Seok-Bum
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2019, 13 (05): : 397 - 404
  • [23] Fast Sparse Deep Neural Network Inference with Flexible SpMM Optimization Space Exploration
    Xin, Jie
    Ye, Xianqi
    Zheng, Long
    Wang, Qinggang
    Huang, Yu
    Yao, Pengcheng
    Yu, Linchen
    Liao, Xiaofei
    Jin, Hai
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [24] FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision
    Imani, Mohsen
    Gupta, Saransh
    Kim, Yeseong
    Rosing, Tajana
    PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 802 - 815
  • [25] Efficient deep neural network training via decreasing precision with layer capacity
    Shen, Ao
    Lai, Zhiquan
    Sun, Tao
    Li, Shengwei
    Ge, Keshi
    Liu, Weijie
    Li, Dongsheng
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (10)
  • [26] LDD: High-Precision Training of Deep Spiking Neural Network Transformers Guided by an Artificial Neural Network
    Liu, Yuqian
    Zhao, Chujie
    Jiang, Yizhou
    Fang, Ying
    Chen, Feng
    BIOMIMETICS, 2024, 9 (07)
  • [27] A training method for deep neural network inference accelerators with high tolerance for their hardware imperfection
    Gao, Shuchao
    Ohsawa, Takashi
    JAPANESE JOURNAL OF APPLIED PHYSICS, 2024, 63 (02)
  • [28] Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
    Wang, Xuanda
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2023 DATA COMPRESSION CONFERENCE, DCC, 2023, : 371 - 371
  • [29] Training deep neural network on multiple GPUs with a model averaging method
    Qiongjie Yao
    Xiaofei Liao
    Hai Jin
    Peer-to-Peer Networking and Applications, 2018, 11 : 1012 - 1021
  • [30] Fast training algorithm for deep neural network using multiple GPUs
    Dai, L. (lrdai@ustc.edu.cn), 1600, Tsinghua University (53):