Custom Hardware Architectures for Deep Learning on Portable Devices: A Review

被引：40

作者：

Zaman, Kh Shahriya ^{[1
]}

Reaz, Mamun Bin Ibne ^{[1
]}

Ali, Sawal Hamid Md ^{[1
]}

Bakar, Ahmad Ashrif A. ^{[1
]}

Chowdhury, Muhammad Enamul Hoque ^{[2
]}

机构：

[1] Univ Kebangsaan Malaysia, Dept Elect Elect & Syst Engn, Bangi 43600, Malaysia

[2] Qatar Univ, Dept Elect Engn, Doha 27113, Qatar

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2022年 / 33卷 / 11期

基金：

新加坡国家研究基金会;

关键词：

Computer architecture; Hardware; Computational modeling; Artificial neural networks; Optimization; Memory management; Convolution; Application-specific integrated circuit (ASIC); deep learning (DL); deep neural network (DNN); energy-efficient architectures; field-programmable gate array (FPGA); hardware accelerator; machine learning (ML); neural network hardware; review; NEURAL-NETWORKS; CHIP; FRAMEWORK; MEMORY; ACCELERATION; PERFORMANCE; PROCESSOR; ALGORITHM; INFERENCE; ENSEMBLE;

D O I：

10.1109/TNNLS.2021.3082304

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The staggering innovations and emergence of numerous deep learning (DL) applications have forced researchers to reconsider hardware architecture to accommodate fast and efficient application-specific computations. Applications, such as object detection, image recognition, speech translation, as well as music synthesis and image generation, can be performed with high accuracy at the expense of substantial computational resources using DL. Furthermore, the desire to adopt Industry 4.0 and smart technologies within the Internet of Things infrastructure has initiated several studies to enable on-chip DL capabilities for resource-constrained devices. Specialized DL processors reduce dependence on cloud servers, improve privacy, lessen latency, and mitigate bandwidth congestion. As we reach the limits of shrinking transistors, researchers are exploring various application-specific hardware architectures to meet the performance and efficiency requirements for DL tasks. Over the past few years, several software optimizations and hardware innovations have been proposed to efficiently perform these computations. In this article, we review several DL accelerators, as well as technologies with emerging devices, to highlight their architectural features in application-specific integrated circuit (IC) and field-programmable gate array (FPGA) platforms. Finally, the design considerations for DL hardware in portable applications have been discussed, along with some deductions about the future trends and potential research directions to innovate DL accelerator architectures further. By compiling this review, we expect to help aspiring researchers widen their knowledge in custom hardware architectures for DL.

引用

页码：6068 / 6088

页数：21

共 145 条

[1] Accelerating Convolutional Neural Network With FFT on Embedded Hardware
Abtahi, Tahmid
Shea, Colin
Kulkarni, Amey
Mohsenin, Tinoosh
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (09) : 1737 - 1749
[2] Aida-zade K, 2017, I C APPL INF COMM TE, P95
[3] True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip
Akopyan, Filipp
Sawada, Jun
Cassidy, Andrew
Alvarez-Icaza, Rodrigo
Arthur, John
Merolla, Paul
Imam, Nabil
Nakamura, Yutaka
Datta, Pallab
Nam, Gi-Joon
Taba, Brian
Beakes, Michael
Brezzo, Bernard
Kuang, Jente B.
Manohar, Rajit
Risk, William P.
Jackson, Bryan
Modha, Dharmendra S.
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) : 1537 - 1557
[4] Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs
Amid, Alon
Biancolin, David
Gonzalez, Abraham
Grubb, Daniel
Karandikar, Sagar
Liew, Harrison
Magyar, Albert
Mao, Howard
Ou, Albert
Pemberton, Nathan
Rigge, Paul
Schmidt, Colin
Wright, John
Zhao, Jerry
Shao, Yakun Sophia
Asanovic, Krste
Nikolic, Borivoje
[J]. IEEE MICRO, 2020, 40 (04) : 10 - 20
[5] MRIMA: An MRAM-Based In-Memory Accelerator
Angizi, Shaahin
He, Zhezhi
Awad, Amro
Fan, Deliang
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (05) : 1123 - 1136
[6] [Anonymous], 2017, ARXIV171208934
[7] [Anonymous], 2020, NVIDIA A100 Tensor Core GPU
[8] [Anonymous], SQUEEZENET ALEXNET L
[9] [Anonymous], 2017, P IEEE 30 CAN C EL C
[10] [Anonymous], 2012, INTEL ARCHITECTURE I

← 1 2 3 4 5 6 7 8 9 10 →