Hardware Architecture Exploration for Deep Neural Networks

被引:2
作者
Zheng, Wenqi [1 ]
Zhao, Yangyi [1 ]
Chen, Yunfan [1 ]
Park, Jinhong [2 ]
Shin, Hyunchul [1 ]
机构
[1] Hanyang Univ, Dept Elect Engn, Ansan, South Korea
[2] Samsung Elect Inc, Suwon, South Korea
关键词
AI architecture; Neural network architecture; CNN; Design space exploration;
D O I
10.1007/s13369-021-05455-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Owing to good performance, deep Convolution Neural Networks (CNNs) are rapidly rising in popularity across a broad range of applications. Since high accuracy CNNs are both computation intensive and memory intensive, many researchers have shown significant interest in the accelerator design. Furthermore, the AI chip market size grows and the competition on the performance, cost, and power consumption of the artificial intelligence SoC designs is increasing. Therefore, it is important to develop design techniques and platforms that are useful for the efficient design of optimized AI architectures to satisfy the given specifications in a short design time. In this research, we have developed design space exploration techniques and environments for the optimal design of the overall system including computing modules and memories. Our current design platform is built using NVIDIA Deep Learning Accelerator as a computing model, SRAM as a buffer, and DRAM with GDDR6 as an off-chip memory. We also developed a program to estimate the processing time of a given neural network. By modifying both the on-chip SRAM size and the computing module size, a designer can explore the design space efficiently, and then choose the optimal architecture which shows the minimal cost while satisfying the performance specification. To illustrate the operation of the design platform, two well-known deep CNNs are used, which are YOLOv3 and faster RCNN. This technology can be used to explore and to optimize the hardware architectures of the CNNs so that the cost can be minimized.
引用
收藏
页码:9703 / 9712
页数:10
相关论文
共 25 条
  • [1] [Anonymous], 2018, GRAPHICS DOUBLE DATA
  • [2] [Anonymous], 2020, HIGH BANDWIDTH MEMOR
  • [3] Arthur, 2019, IEEE T COMPUT AID D
  • [4] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
    Chen, Tianshi
    Du, Zidong
    Sun, Ninghui
    Wang, Jia
    Wu, Chengyong
    Chen, Yunji
    Temam, Olivier
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 269 - 283
  • [5] DaDianNao: A Machine-Learning Supercomputer
    Chen, Yunji
    Luo, Tao
    Liu, Shaoli
    Zhang, Shijin
    He, Liqiang
    Wang, Jia
    Li, Ling
    Chen, Tianshi
    Xu, Zhiwei
    Sun, Ninghui
    Temam, Olivier
    [J]. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 609 - 622
  • [6] ShiDianNao: Shifting Vision Processing Closer to the Sensor
    Du, Zidong
    Fasthuber, Robert
    Chen, Tianshi
    Ienne, Paolo
    Li, Ling
    Luo, Tao
    Feng, Xiaobing
    Chen, Yunji
    Temam, Olivier
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 92 - 104
  • [7] Enrico, 2019 IEEE INT PAR DI
  • [8] EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark A.
    Dally, William J.
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
  • [9] Irtiza, 2020, ARXIV PREPRINT ARXIV
  • [10] Joseph R., 2018, YOLOV3 INCREMENTAL I, DOI DOI 10.48550/ARXIV.1804.02767