ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

被引:4
|
作者
Gong, Jing [1 ]
Saadat, Hassaan [2 ]
Gamaarachchi, Hasindu [1 ,3 ]
Javaid, Haris [4 ]
Hu, Xiaobo Sharon [5 ]
Parameswaran, Sri [6 ]
机构
[1] UNSW Sydney, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
[2] UNSW Sydney, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
[3] Garvan Inst Med Res, Kinghorn Ctr Clin Genom, Darlinghurst, NSW 2010, Australia
[4] Adapt Embedded & AI Grp, AMD, Singapore 469296, Singapore
[5] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[6] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
关键词
Training; Computer architecture; Hardware; Graphics processing units; Computational modeling; Libraries; Convergence; Approximate multiplier; approximate TensorFlow (TF); deep neural network (DNN) training;
D O I
10.1109/TCAD.2023.3253045
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Edge training of deep neural networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness in gaining resource efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource-efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This article presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). Additionally, a novel flow is presented to seamlessly convert C/C++ functional models of approximate FP multipliers into AMSim. ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy performance of DNN training with approximate multipliers for three application domains: image classification, object detection, and neural machine translation. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and Bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is, on average, only 8x faster than ApproxTrain.
引用
收藏
页码:3505 / 3518
页数:14
相关论文
共 50 条
  • [21] Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference
    Seokhyeon Choi
    Kyuhong Shim
    Jungwook Choi
    Wonyong Sung
    Byonghyo Shim
    Journal of Signal Processing Systems, 2022, 94 : 929 - 943
  • [22] The Perfect Match: Selecting Approximate Multipliers for Energy-Efficient Neural Network Inference
    Spantidi, Ourania
    Anagnostopoulos, Iraklis
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING, HPSR, 2023,
  • [23] Fast and fair split computing for accelerating deep neural network (DNN) inference
    Cha, Dongju
    Lee, Jaewook
    Jung, Daeyoung
    Pack, Sangheon
    ICT EXPRESS, 2025, 11 (01): : 47 - 52
  • [24] TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference
    Choi, Seokhyeon
    Shim, Kyuhong
    Choi, Jungwook
    Sung, Wonyong
    Shim, Byonghyo
    2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 111 - 116
  • [25] Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference
    Choi, Seokhyeon
    Shim, Kyuhong
    Choi, Jungwook
    Sung, Wonyong
    Shim, Byonghyo
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (10): : 929 - 943
  • [26] AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network
    Wang, Xinyu
    Jiang, Yong
    Bach, Nguyen
    Wang, Tao
    Huang, Zhongqiang
    Huang, Fei
    Tu, Kewei
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6019 - 6026
  • [27] STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators
    Munoz-Martinez, Francisco
    Abellan, Jose L.
    Acacio, Manuel E.
    Krishna, Tushar
    IEEE COMPUTER ARCHITECTURE LETTERS, 2021, 20 (02) : 122 - 125
  • [28] TLED: Training-Based Approximate Layer Exploration in DNNs with Efficient Multipliers
    Li, Kunlong
    Li, Zhen
    Wang, Lingli
    2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 247 - 252
  • [29] STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators
    Munoz-Martinez, Francisco
    Abellan, Jose L.
    Acacio, Manuel E.
    Krishna, Tushar
    2021 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2021), 2021, : 201 - 213
  • [30] Towards Fast GPU-based Sparse DNN Inference: A Hybrid Compute Model
    Xu, Shaoxian
    Wu, Minkang
    Zheng, Long
    Shao, Zhiyuan
    Ye, Xiangyu
    Liao, Xiaofei
    Jin, Hai
    2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,