New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference

被引：29

作者：

Zhang, Hao ^{[1
]}

Chen, Dongdong ^{[2
]}

Ko, Seok-Bum ^{[1
]}

机构：

[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada

[2] Intel Corp, San Jose, CA 95134 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2020年 / 69卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Neural networks; Standards; Deep learning; Training; Hardware; Adders; Pipelines; Multiply-accumulate unit; multiple-precision arithmetic; flexible precision arithmetic; deep neural network computing; computer arithmetic; ADD;

D O I：

10.1109/TC.2019.2936192

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more versatile, the bit-width of exponent and mantissa can be flexibly exchanged. By setting the bit-width of exponent to zero, the proposed MAC unit also supports fixed-point operations. For fixed-point format, the proposed unit supports one 16-bit MAC or sum of two 8-bit multiplications plus a 16-bit addend. Moreover, the proposed unit can be further divided to support sum of four 4-bit multiplications plus a 16-bit addend. At the lowest precision, the proposed MAC unit supports accumulating of eight 1-bit logic AND operations to enable the support of binary neural networks. Compared to the standard 16-bit half-precision MAC unit, the proposed MAC unit provides more flexibility with only 21.8 percent area overhead. Compared to a standard 32-bit single-precision MAC unit, the proposed MAC unit requires much less hardware cost but still provides 8-bit exponent in the numerical format to maintain large dynamic range for deep learning computing.

引用

页码：26 / 38

页数：13

共 50 条

[31] Training deep neural network on multiple GPUs with a model averaging method
Yao, Qiongjie
Liao, Xiaofei
Jin, Hai
PEER-TO-PEER NETWORKING AND APPLICATIONS, 2018, 11 (05) : 1012 - 1021
[32] Shakeout: A New Approach to Regularized Deep Neural Network Training
Kang, Guoliang
Li, Jun
Tao, Dacheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1245 - 1258
[33] Shakeout: A New Regularized Deep Neural Network Training Scheme
Kang, Guoliang
Li, Jun
Tao, Dacheng
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1751 - 1757
[34] Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
Vinck, Toon
Jonckers, Nain
Dekkers, Gert
Prinzie, Jeffrey
Karsmakers, Peter
JOURNAL OF INSTRUMENTATION, 2025, 20 (02):
[35] FlexACC: A Programmable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference
Yang, En-Yu
Jia, Tianyu
Brooks, David
Wei, Gu-Yeon
2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 266 - 273
[36] Dynamically Adapting Floating-Point Precision to Accelerate Deep Neural Network Training
Osorio Rios, John
Armejach, Adria
Petit, Eric
Henry, Greg
Casas, Marc
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 980 - 987
[37] EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
Cavigelli, Lukas
Rutishauser, Georg
Benini, Luca
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 723 - 734
[38] A Deep Neural Network Training Architecture With Inference-Aware Heterogeneous Data-Type
Choi, Seungkyu
Shin, Jaekang
Kim, Lee-Sup
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (05) : 1216 - 1229
[39] Analog CMOS-based Resistive Processing Unit for Deep Neural Network Training
Kim, Seyoung
Gokmen, Tayfun
Lee, Hyung-Min
Haensch, Wilfried E.
2017 IEEE 60TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2017, : 422 - 425
[40] PCM: Precision-Controlled Memory System for Energy Efficient Deep Neural Network Training
Kim, Boyeal
Lee, Sang Hyun
Kim, Hyun
Duy-Thanh Nguyen
Minh-Son Le
Chang, Ik Joon
Kwon, Dohun
Yoo, Jin Hyeok
Choi, Jun Won
Lee, Hyuk-Jae
PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1199 - 1204

← 1 2 3 4 5 →