DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators

被引:0
|
作者
Ranawaka, Piyumal [1 ]
Azhar, Muhammad Waqar [1 ]
Stenstrom, Per [1 ]
机构
[1] Chalmers Univ Technol, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
DNN acceleration; Loop Re-Order; Loop Blocking; Reuse Distance; Energy Efficient DNN Acceleration; On-chip Memory Management; DESIGN SPACE EXPLORATION; PERFORMANCE; COMPILER; REUSE;
D O I
10.1145/3649153.3649196
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep neural network (DNN) accelerators suffer from poor utilization of on-chip memory which potentially reduces performance and energy efficiency. Loop reordering and blocking are used to improve on-chip memory utilization in DNN accelerators. However, existing optimization frameworks are inefficient due to either a prohibitive time complexity of searching the entire search space or due to a sub-optimal choice of optimizations. This paper proposes DNNOPT - a hardware/software framework for optimally selecting loop order and blocking factors, for loop reordering and blocking in isolation or in combination. DNNOPT uses proposed Early exit and Strided search strategies to prune the search space and simple analytical models of data reuse to evaluate each optimization point efficiently and accurately. Overall, DNNOPT reduces the search space by more than two orders of magnitude and improves performance, energy efficiency and time to solution, on average, by 1.8x, 50%, and 226x, respectively, of convolutional neural network (CNN) and Transformer applications compared to state-of-the-art frameworks.
引用
收藏
页码:126 / 137
页数:12
相关论文
共 24 条
  • [1] A framework for coarse-grain optimizations in the on-chip memory hierarchy
    Zebchuk, Jason
    Safi, Elham
    Moshovos, Andreas
    MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, : 314 - 326
  • [2] A framework for loop distribution on limited on-chip memory processors
    Wang, L
    Tembe, W
    Pande, S
    COMPILER CONSTRUCTION, PROCEEDINGS, 2000, 1781 : 141 - 156
  • [3] DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-Chip Training
    Peng, Xiaochen
    Huang, Shanshi
    Jiang, Hongwu
    Lu, Anni
    Yu, Shimeng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (11) : 2306 - 2319
  • [4] Advances and Trends on On-Chip Compute-in-Memory Macros and Accelerators
    Seo, Jae-sun
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [5] A building block for coarse-grain optimizations in the on-chip memory hierarchy
    Zebchuk, Jason
    Moshovos, Andreas
    IEEE Computer Architecture Letters, 2007, 6 (02) : 33 - 36
  • [6] TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators
    Maas, Martin
    Beaugnon, Ulysse
    Chauhan, Arun
    Ilbeyi, Berkin
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 1, ASPLOS 2023, 2023, : 123 - 137
  • [7] Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators
    Kang, Duseok
    Kang, Donghyun
    Ha, Soonhoi
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (05) : 1181 - 1193
  • [8] Tolerating Soft Errors in Deep Learning Accelerators with Reliable On-Chip Memory Designs
    Azizimazreah, Arash
    Gu, Yongbin
    Gu, Xiang
    Chen, Lizhong
    2018 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2018,
  • [9] A Case of On-Chip Memory Subsystem Design for Low-Power CNN Accelerators
    Wang, Ying
    Li, Huawei
    Li, Xiaowei
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (10) : 1971 - 1984
  • [10] Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training
    Kim, Jungwoo
    Na, Seonjin
    Lee, Sanghyeon
    Lee, Sunho
    Huh, Jaehyuk
    56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 438 - 451