ELSA: A Throughput-Optimized Design of an LSTM Accelerator for Energy-Constrained Devices

被引:14
|
作者
Azari, Elham [1 ]
Vrudhula, Sarma [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
关键词
Recurrent neural network; LSTM; embedded systems; accelerator; deep learning; domain-specific architecture; low power;
D O I
10.1145/3366634
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The next significant step in the evolution and proliferation of artificial intelligence technology will be the integration of neural network (NN) models within embedded and mobile systems. This calls for the design of compact, energy efficient NN models in silicon. In this article, we present a scalable application specific integrated circuit (ASIC) design of an energy-efficient Long Short-Term Memory (LSTM) accelerator, named ELSA, which is suitable for energy-constrained devices. It includes several architectural innovations to achieve small area and high energy efficiency. To reduce the area and power consumption of the overall design, the compute-intensive units of ELSA employ approximate multiplications and still achieve high performance and accuracy. The performance is further improved through efficient synchronization of the elastic pipeline stages to maximize the utilization. The article also includes a performance model of ELSA, as a function of the hidden nodes and timesteps, permitting its use for the evaluation of any LSTM application. ELSA was implemented in register transfer level (RTL) and was synthesized and placed and routed in 65nm technology. Its functionality is demonstrated for language modeling-a common application of LSTM. ELSA is compared against a baseline implementation of an LSTM accelerator with standard functional units and without any of the architectural innovations of ELSA. The article demonstrates that ELSA can achieve significant improvements in power, area, and energy-efficiency when compared to the baseline design and several ASIC implementations reported in the literature, making it suitable for use in embedded systems and real-time applications.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks
    Liu, Zhiqiang
    Dou, Yong
    Jiang, Jingfei
    Xu, Jinwei
    Li, Shijie
    Zhou, Yongmei
    Xu, Yingnan
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2017, 10 (03)
  • [2] Design and Implementation of a Throughput-Optimized GPU Floorplanning Algorithm
    Han, Yiding
    Chakraborty, Koushik
    Roy, Sanghamitra
    Kuntamukkala, Vilasita
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2011, 16 (03)
  • [3] Design of throughput-optimized arrays from recurrence abstractions
    Jacob, Arpith C.
    Buhler, Jeremy D.
    Chamberlain, Roger D.
    21ST IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2010,
  • [4] Estimation of distribution-based multiobjective design space exploration for energy and throughput-optimized MPSoCs
    Murad, Maryam
    Hussain, Ishfaq
    Ahmad, Ayaz
    Qadri, Muhammad Yasir
    Qadri, Nadia N.
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (01) : 540 - 555
  • [5] Design Space Exploration of throughput-optimized arrays from recurrence abstractions
    Jacob, Arpith C.
    Buhler, Jeremy D.
    Chamberlain, Roger D.
    FPGA 10, 2010, : 286 - 287
  • [6] ENERGY-CONSTRAINED THROUGHPUT MAXIMIZATION FOR POINT-TO-POINT COMMUNICATIONS
    Bai, Qing
    Li, Jingrui
    Nossek, Josef A.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
    Suda, Naveen
    Chandra, Vikas
    Dasika, Ganesh
    Mohanty, Abinash
    Ma, Yufei
    Vrudhula, Sarma
    Seo, Jae-Sun
    Cao, Yu
    PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, : 16 - 25
  • [8] Data Collection of IoT Devices Using an Energy-Constrained UAV
    Li, Yuchen
    Liang, Weifa
    Xu, Wenzheng
    Jia, Xiaohua
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 644 - 653
  • [9] Network formation among selfish energy-constrained wireless devices
    Nama, Hithesh
    Mandayam, Narayan
    Yates, Roy
    27TH IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), VOLS 1-5, 2008, : 1427 - 1435
  • [10] High-Level Synthesis of Throughput-optimized and Energy-efficient Approximate Designs
    Leipnitz, Marcos T.
    Nazar, Gabriel L.
    17TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2020 (CF 2020), 2020, : 221 - 224