At-Scale Assessment of Weight Clustering for Energy-Efficient Object Detection Accelerators

被引:0
作者
Caro, Marti [1 ,2 ]
Tabani, Hamid [1 ]
Abella, Jaume [1 ]
机构
[1] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[2] Univ Politecn Catalunya UPC, Barcelona, Spain
来源
37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING | 2022年
关键词
D O I
10.1145/3477314.3507161
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
DNN-based object detection operates on large data volumes to fetch images andDNNweights, which leads to high power and bandwidth demands. Solutions to mitigate those demands, such as weight clustering, are normally studied in limited examples of a much smaller scale than target applications, which poses difficulties to determine the best tradeoff to implement. This paper performs an atscale (using a real life application) assessment of weight clustering for a DNN-based object detection system - You Only Look Once (YOLO) - considering real driving videos. Our case study shows that an Output Stationary accelerator (e.g. a systolic array) restricting weights to only between 32 (5-bit) and 256 (8-bit) different values allows preserving the accuracy of the original 32-bit weights of YOLO while decreasing bandwidth requirements to around 30%40% of the original bandwidth, and overall energy consumption to around 45% of the original consumption. Overall, our case study provides key insights on which to take design decisions for an accelerator for camera-based object detection.
引用
收藏
页码:530 / 533
页数:4
相关论文
共 22 条
  • [1] [Anonymous], 2018, APOLLO OPEN AUTONOMO
  • [2] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
    Chen, Tianshi
    Du, Zidong
    Sun, Ninghui
    Wang, Jia
    Wu, Chengyong
    Chen, Yunji
    Temam, Olivier
    [J]. ACM SIGPLAN NOTICES, 2014, 49 (04) : 269 - 283
  • [3] Choi Y, 2017, Arxiv, DOI arXiv:1612.01543
  • [4] A data-clustering algorithm on distributed memory multiprocessors
    Dhillon, IS
    Modha, DS
    [J]. LARGE-SCALE PARALLEL DATA MINING, 2000, 1759 : 245 - 260
  • [5] Gong YC, 2014, Arxiv, DOI arXiv:1412.6115
  • [6] Han S, 2016, Arxiv, DOI arXiv:1510.00149
  • [7] EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark A.
    Dally, William J.
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
  • [8] GENERAL-PURPOSE SYSTOLIC ARRAYS
    JOHNSON, KT
    HURSON, AR
    SHIRAZI, B
    [J]. COMPUTER, 1993, 26 (11) : 20 - 31
  • [9] VLSI ARRAY PROCESSORS.
    Kung, Sun-Yuan
    [J]. IEEE ASSP magazine, 1985, 2 (03): : 4 - 22
  • [10] DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator
    Li, Shang
    Yang, Zhiyuan
    Reddy, Dhiraj
    Srivastava, Ankur
    Jacob, Bruce
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2020, 19 (02) : 106 - 109