ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

被引:4
|
作者
Gong, Jing [1 ]
Saadat, Hassaan [2 ]
Gamaarachchi, Hasindu [1 ,3 ]
Javaid, Haris [4 ]
Hu, Xiaobo Sharon [5 ]
Parameswaran, Sri [6 ]
机构
[1] UNSW Sydney, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] UNSW Sydney, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
[3] Garvan Inst Med Res, Kinghorn Ctr Clin Genom, Darlinghurst, NSW 2010, Australia
[4] Adapt Embedded & AI Grp, AMD, Singapore 469296, Singapore
[5] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[6] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
关键词
Training; Computer architecture; Hardware; Graphics processing units; Computational modeling; Libraries; Convergence; Approximate multiplier; approximate TensorFlow (TF); deep neural network (DNN) training;
D O I
10.1109/TCAD.2023.3253045
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Edge training of deep neural networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness in gaining resource efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource-efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This article presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). Additionally, a novel flow is presented to seamlessly convert C/C++ functional models of approximate FP multipliers into AMSim. ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy performance of DNN training with approximate multipliers for three application domains: image classification, object detection, and neural machine translation. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and Bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is, on average, only 8x faster than ApproxTrain.
引用
收藏
页码:3505 / 3518
页数:14
相关论文
共 50 条
  • [31] ACBN: Approximate Calculated Batch Normalization for Efficient DNN On-Device Training Processor
    Li, Baoting
    Wang, Hang
    Luo, Fujie
    Zhang, Xuchong
    Sun, Hongbin
    Zheng, Nanning
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (06) : 738 - 748
  • [32] SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs
    Gong, Zhangxiaowen
    Ji, Houxiang
    Fletcher, Christopher W.
    Hughes, Christopher J.
    Baghsorkhi, Sara
    Torrellas, Josep
    2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 796 - 810
  • [33] Fast SVM Training Using Approximate Extreme Points
    Nandan, Manu
    Khargonekar, Pramod P.
    Talathi, Sachin S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 59 - 98
  • [34] Fast SVM training using approximate extreme points
    Nandan, Manu
    Khargonekar, Pramod P.
    Talathi, Sachin S.
    1600, Microtome Publishing (15): : 59 - 98
  • [35] Fast Gauss RBF Training Using Approximate Pole
    Cao, Long-han
    Huang, Yang
    Pi, Chang-qian
    Fan, Zhou-jun
    Wang, Le-yuan
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [36] Fast Relational Probabilistic Inference and Learning: Approximate Counting via Hypergraphs
    Das, Mayukh
    Dhami, Devendra Singh
    Kunapuli, Gautam
    Kersting, Kristian
    Natarajan, Sriraam
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7816 - 7824
  • [37] A comprehensive exploration of approximate DNN models with a novel floating-point simulation framework
    Kwak, Myeongjin
    Kim, Jeonggeun
    Kim, Yongtae
    PERFORMANCE EVALUATION, 2024, 165
  • [38] PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters
    Zhang, Jinghui
    Niu, Geng
    Dai, Qiangsheng
    Li, Haorui
    Wu, Zhihua
    Dong, Fang
    Wu, Zhiang
    NEUROCOMPUTING, 2023, 555
  • [39] FAST MODEL INFERENCE AND TRAINING ON-BOARD OF SATELLITES
    Ruzicka, Vit
    Mateo-Garcia, Gonzalo
    Bridges, Chris
    Brunskill, Chris
    Purcell, Cormac
    Longepe, Nicolas
    Markham, Andrew
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 2002 - 2005
  • [40] A DISTRIBUTED ARCHITECTURE FOR FAST SGD SEQUENCE DISCRIMINATIVE TRAINING OF DNN ACOUSTIC MODELS
    Saon, George
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 183 - 188