Custom Hardware Architectures for Deep Learning on Portable Devices: A Review

被引:40
作者
Zaman, Kh Shahriya [1 ]
Reaz, Mamun Bin Ibne [1 ]
Ali, Sawal Hamid Md [1 ]
Bakar, Ahmad Ashrif A. [1 ]
Chowdhury, Muhammad Enamul Hoque [2 ]
机构
[1] Univ Kebangsaan Malaysia, Dept Elect Elect & Syst Engn, Bangi 43600, Malaysia
[2] Qatar Univ, Dept Elect Engn, Doha 27113, Qatar
基金
新加坡国家研究基金会;
关键词
Computer architecture; Hardware; Computational modeling; Artificial neural networks; Optimization; Memory management; Convolution; Application-specific integrated circuit (ASIC); deep learning (DL); deep neural network (DNN); energy-efficient architectures; field-programmable gate array (FPGA); hardware accelerator; machine learning (ML); neural network hardware; review; NEURAL-NETWORKS; CHIP; FRAMEWORK; MEMORY; ACCELERATION; PERFORMANCE; PROCESSOR; ALGORITHM; INFERENCE; ENSEMBLE;
D O I
10.1109/TNNLS.2021.3082304
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The staggering innovations and emergence of numerous deep learning (DL) applications have forced researchers to reconsider hardware architecture to accommodate fast and efficient application-specific computations. Applications, such as object detection, image recognition, speech translation, as well as music synthesis and image generation, can be performed with high accuracy at the expense of substantial computational resources using DL. Furthermore, the desire to adopt Industry 4.0 and smart technologies within the Internet of Things infrastructure has initiated several studies to enable on-chip DL capabilities for resource-constrained devices. Specialized DL processors reduce dependence on cloud servers, improve privacy, lessen latency, and mitigate bandwidth congestion. As we reach the limits of shrinking transistors, researchers are exploring various application-specific hardware architectures to meet the performance and efficiency requirements for DL tasks. Over the past few years, several software optimizations and hardware innovations have been proposed to efficiently perform these computations. In this article, we review several DL accelerators, as well as technologies with emerging devices, to highlight their architectural features in application-specific integrated circuit (IC) and field-programmable gate array (FPGA) platforms. Finally, the design considerations for DL hardware in portable applications have been discussed, along with some deductions about the future trends and potential research directions to innovate DL accelerator architectures further. By compiling this review, we expect to help aspiring researchers widen their knowledge in custom hardware architectures for DL.
引用
收藏
页码:6068 / 6088
页数:21
相关论文
共 145 条
  • [1] Accelerating Convolutional Neural Network With FFT on Embedded Hardware
    Abtahi, Tahmid
    Shea, Colin
    Kulkarni, Amey
    Mohsenin, Tinoosh
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (09) : 1737 - 1749
  • [2] Aida-zade K, 2017, I C APPL INF COMM TE, P95
  • [3] True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip
    Akopyan, Filipp
    Sawada, Jun
    Cassidy, Andrew
    Alvarez-Icaza, Rodrigo
    Arthur, John
    Merolla, Paul
    Imam, Nabil
    Nakamura, Yutaka
    Datta, Pallab
    Nam, Gi-Joon
    Taba, Brian
    Beakes, Michael
    Brezzo, Bernard
    Kuang, Jente B.
    Manohar, Rajit
    Risk, William P.
    Jackson, Bryan
    Modha, Dharmendra S.
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) : 1537 - 1557
  • [4] Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs
    Amid, Alon
    Biancolin, David
    Gonzalez, Abraham
    Grubb, Daniel
    Karandikar, Sagar
    Liew, Harrison
    Magyar, Albert
    Mao, Howard
    Ou, Albert
    Pemberton, Nathan
    Rigge, Paul
    Schmidt, Colin
    Wright, John
    Zhao, Jerry
    Shao, Yakun Sophia
    Asanovic, Krste
    Nikolic, Borivoje
    [J]. IEEE MICRO, 2020, 40 (04) : 10 - 20
  • [5] MRIMA: An MRAM-Based In-Memory Accelerator
    Angizi, Shaahin
    He, Zhezhi
    Awad, Amro
    Fan, Deliang
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (05) : 1123 - 1136
  • [6] [Anonymous], 2017, ARXIV171208934
  • [7] [Anonymous], 2020, NVIDIA A100 Tensor Core GPU
  • [8] [Anonymous], SQUEEZENET ALEXNET L
  • [9] [Anonymous], 2017, P IEEE 30 CAN C EL C
  • [10] [Anonymous], 2012, INTEL ARCHITECTURE I