ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

被引:4
|
作者
Gong, Jing [1 ]
Saadat, Hassaan [2 ]
Gamaarachchi, Hasindu [1 ,3 ]
Javaid, Haris [4 ]
Hu, Xiaobo Sharon [5 ]
Parameswaran, Sri [6 ]
机构
[1] UNSW Sydney, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] UNSW Sydney, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
[3] Garvan Inst Med Res, Kinghorn Ctr Clin Genom, Darlinghurst, NSW 2010, Australia
[4] Adapt Embedded & AI Grp, AMD, Singapore 469296, Singapore
[5] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[6] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
关键词
Training; Computer architecture; Hardware; Graphics processing units; Computational modeling; Libraries; Convergence; Approximate multiplier; approximate TensorFlow (TF); deep neural network (DNN) training;
D O I
10.1109/TCAD.2023.3253045
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Edge training of deep neural networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness in gaining resource efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource-efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This article presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). Additionally, a novel flow is presented to seamlessly convert C/C++ functional models of approximate FP multipliers into AMSim. ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy performance of DNN training with approximate multipliers for three application domains: image classification, object detection, and neural machine translation. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and Bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is, on average, only 8x faster than ApproxTrain.
引用
收藏
页码:3505 / 3518
页数:14
相关论文
共 50 条
  • [41] AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference
    He, Xin
    Ke, Liu
    Lu, Wenyan
    Yan, Guihai
    Zhang, Xuan
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED '18), 2018, : 110 - 115
  • [42] Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator
    Park, Yeonhong
    Lee, Seung Yul
    Shin, Hoon
    Heo, Jun
    Ham, Tae Jun
    Lee, Jae W.
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [43] MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits
    Hou, Xiaofeng
    Liu, Jiacheng
    Tang, Xuehan
    Li, Chao
    Cheng, Kwang-Ting
    Li, Li
    Guo, Minyi
    EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 426 - 440
  • [44] Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization
    Chen, Tzu-Yu
    Hsiao, Po-Wen
    Chi, Tai-Shih
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1929 - 1933
  • [45] Deep Nibble: A 4-bit Number Format for Efficient DNN Training and Inference in FPGA
    Duarte, Luiz F. H.
    Nardes, George B.
    Grignani, Wesley
    Melo, Douglas R.
    Zeferino, Cesar A.
    2024 37TH SBC/SBMICRO/IEEE SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN, SBCCI 2024, 2024, : 90 - 94
  • [46] Training neural networks to approximate traffic simulation outcomes
    Gora, Pawel
    Bardonski, Marek
    2017 5TH IEEE INTERNATIONAL CONFERENCE ON MODELS AND TECHNOLOGIES FOR INTELLIGENT TRANSPORTATION SYSTEMS (MT-ITS), 2017, : 889 - 894
  • [47] Approximate Bayesian Computation by Subset Simulation for Parameter Inference of Dynamical Models
    Vakilzadeh, Majid K.
    Huang, Yong
    Beck, James L.
    Abrahamsson, Thomas
    MODEL VALIDATION AND UNCERTAINTY QUANTIFICATION, VOL 3, 2016, : 37 - 50
  • [48] ProxSim: GPU-based Simulation Framework for Cross-Layer Approximate DNN Optimization
    De la Parra, Cecilia
    Guntoro, Andre
    Kumar, Akash
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1193 - 1198
  • [49] FAST FACTORISATION OF PROBABILISTIC POTENTIALS AND ITS APPLICATION TO APPROXIMATE INFERENCE IN BAYESIAN NETWORKS
    Cano, Andres
    Gomez-Olmedo, Manuel
    Perez-Ariza, Cora B.
    Salmeron, Antonio
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2012, 20 (02) : 223 - 243
  • [50] Ultra-Fast Approximate Inference Using Variational Functional Mixed Models
    Huo, Shuning
    Morris, Jeffrey S.
    Zhu, Hongxiao
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 353 - 365