ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

被引:4
|
作者
Gong, Jing [1 ]
Saadat, Hassaan [2 ]
Gamaarachchi, Hasindu [1 ,3 ]
Javaid, Haris [4 ]
Hu, Xiaobo Sharon [5 ]
Parameswaran, Sri [6 ]
机构
[1] UNSW Sydney, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] UNSW Sydney, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
[3] Garvan Inst Med Res, Kinghorn Ctr Clin Genom, Darlinghurst, NSW 2010, Australia
[4] Adapt Embedded & AI Grp, AMD, Singapore 469296, Singapore
[5] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[6] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
关键词
Training; Computer architecture; Hardware; Graphics processing units; Computational modeling; Libraries; Convergence; Approximate multiplier; approximate TensorFlow (TF); deep neural network (DNN) training;
D O I
10.1109/TCAD.2023.3253045
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Edge training of deep neural networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness in gaining resource efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource-efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This article presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). Additionally, a novel flow is presented to seamlessly convert C/C++ functional models of approximate FP multipliers into AMSim. ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy performance of DNN training with approximate multipliers for three application domains: image classification, object detection, and neural machine translation. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and Bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is, on average, only 8x faster than ApproxTrain.
引用
收藏
页码:3505 / 3518
页数:14
相关论文
共 50 条
  • [1] Positive/Negative Approximate Multipliers for DNN Accelerators
    Spantidi, Ourania
    Zervakis, Georgios
    Anagnostopoulos, Iraklis
    Amrouch, Hussain
    Henkel, Joerg
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [2] DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN Training and Inference
    Sonnino, Lorenzo
    Shresthamali, Shaswot
    He, Yuan
    Kondo, Masaaki
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [3] Design and Analysis of Multipliers for DNN application using approximate 4:2 Compressors
    Gillurkar, Hemant
    Dwaramwar, Pravin
    Anjankar, Shubham
    Joshi, Pankaj
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (05): : 1212 - 1219
  • [4] Hardware-Software Codesign of DNN Accelerators using Approximate Posit Multipliers
    Glint, Tom
    Prasad, Kailash
    Dagli, Jinay
    Gandhi, Krishil
    Gupta, Aryan
    Patel, Vrajesh
    Shah, Neel
    Mekie, Joycee
    2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 469 - 474
  • [5] AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch
    Danopoulos, Dimitrios
    Zervakis, Georgios
    Siozios, Kostas
    Soudris, Dimitrios
    Henkel, Joerg
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (06) : 2074 - 2078
  • [6] Approximate Random Dropout for DNN training acceleration in GPGPU
    Song, Zhuoran
    Wang, Ru
    Ru, Dongyu
    Peng, Zhenghao
    Huang, Hongru
    Zhao, Hai
    Liang, Xiaoyao
    Jiang, Li
    2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 108 - 113
  • [7] STOCHASTIC DATA SWEEPING FOR FAST DNN TRAINING
    Deng, Wei
    Qian, Yanmin
    Fan, Yuchen
    Fu, Tianfan
    Yu, Kai
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [8] Approximate Bayesian inference for simulation and optimization
    Ryzhov, Ilya O.
    MODELING AND OPTIMIZATION: THEORY AND APPLICATIONS, 2015, 147 : 1 - 28
  • [9] Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial
    Mao, Wendong
    Wang, Meiqi
    Xie, Xiaoru
    Wu, Xiao
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1708 - 1714
  • [10] Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
    Prashanthi, S. K.
    Hegde, Vinayaka
    Patchava, Keerthana
    Das, Ankita
    Simmhan, Yogesh
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 127 - 131