Computationally Efficient DNN Mapping Search Heuristic using Deep Reinforcement Learning

被引:0
作者
Bakshi, Suyash [1 ]
Johnsson, Lennart [1 ]
机构
[1] Univ Houston, Dept Comp Sci, Philip Guthrie Hoffman Hall,3551 Cullen Blvd, Houston, TX 77204 USA
关键词
Reinforcement learning; DNN; mapping search; convolution;
D O I
10.1145/3609110
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we present a computationally efficient Reinforcement Learning mapping search heuristic for finding high quality mappings for N-dimensional convolution loops that uses a computationally inexpensive reward function based on potential data reuse of operands to guide the search process. We also present a RL state representation generalizable to N-dimensional convolution loops, and a state representation parsing strategy ensuring that only valid mappings are evaluated for quality. Our RL search heuristic is applicable to multi-core systems with a memory hierarchy. We show that our RL based search heuristic for a range of 3D convolution layers, at significantly lower computational expense than random search, generally yields mappingswith lower Energy-Delay Product (EDP) for an architecture with multiple processing elements with shared memory connected to DRAM. Our evaluation results demonstrated across 19 3D convolution layers, shows that our RL method performed only an average 11.24% of the operations of that of Timeloop's random search for assessing same number of valid mappings. The mappings found using Timeloop had an average 12.51% higher EDP compared to lowest EDP mapping found using our RL method. Further, the lowest EDP mappings found using our method had an average only 4.69x higher EDP than the theoretical lower bound EDP, with the best case being only 1.29x higher.
引用
收藏
页数:21
相关论文
共 47 条
[1]  
Abadi M., 2015, TensorFlow: Large-scale machine Learning on heterogeneous distributed systems, DOI DOI 10.48550/ARXIV.1603.04467
[2]  
Ali M., 2012, 2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2012), P179, DOI 10.1109/SBAC-PAD.2012.26
[3]  
[Anonymous], 2023, NVIDIA CUDA Basic Linear Algebra Subroutine Library
[4]  
[Anonymous], 2023, Minimal Standard Minstd_rand0 Generator
[5]  
[Anonymous], 2023, Intel oneAPI Math Kernel Library.
[6]  
[Anonymous], 2021, Marine Seismic Dataset.
[7]   A Highly Efficient SGEMM Implementation using DMA on the Intel/Movidius Myriad-2 [J].
Bakshi, Suyash ;
Johnsson, Lennart .
2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, :321-328
[8]   CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories [J].
Balasubramonian, Rajeev ;
Kahng, Andrew B. ;
Muralimanohar, Naveen ;
Shafiee, Ali ;
Srinivas, Vaishnav .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (02)
[9]   ALWAYS-ON VISION PROCESSING UNIT FOR MOBILE APPLICATIONS [J].
Barry, Brendan ;
Brick, Cormac ;
Connor, Fergal ;
Donohoe, David ;
Moloney, David ;
Richmond, Richard ;
O'Riordan, Martin ;
Toma, Vasile .
IEEE MICRO, 2015, 35 (02) :56-66
[10]  
Cadillac, 2023, About us