Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach

被引:191
作者
Kwon, Hyoukjun [1 ]
Chatarasi, Prasanth [1 ]
Pellauer, Michael [2 ]
Parashar, Angshuman [2 ]
Sarkar, Vivek [1 ]
Krishna, Tushar [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] NVIDIA, Westford, MA USA
来源
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE | 2019年
基金
美国国家科学基金会;
关键词
Neural networks; Dataflow; Cost modeling; TRANSFORMATIONS;
D O I
10.1145/3352460.3358252
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, which directly impacts the performance and energy efficiency of DNN accelerators. An accelerator microarchitecture dictates the dataflow(s) that can be employed to execute layers in a DNN. Selecting a dataflow for a layer can have a large impact on utilization and energy efficiency, but there is a lack of understanding on the choices and consequences of dataflows, and of tools and methodologies to help architects explore the co-optimization design space. In this work, we first introduce a set of data-centric directives to concisely specify the DNN dataflow space in a compiler-friendly form. We then show how these directives can be analyzed to infer various forms of reuse and to exploit them using hardware capabilities. We codify this analysis into an analytical cost model, MAESTRO (Modeling Accelerator Efficiency via Spatio-Temporal Reuse and Occupancy), that estimates various cost-benefit tradeoffs of a dataflow including execution time and energy efficiency for a DNN model and hardware configuration. We demonstrate the use of MAESTRO to drive a hardware design space exploration experiment, which searches across 480M designs to identify 2.5M valid designs at an average rate of 0.17M designs per second, including Pareto-optimal throughput- and energy-optimized design points.
引用
收藏
页码:754 / 768
页数:15
相关论文
共 46 条
  • [1] [Anonymous], INT S MICR MICRO
  • [2] [Anonymous], P 3 INT C LEARNING R
  • [3] [Anonymous], 2018, IEEE T COMPUTER AIDE
  • [4] [Anonymous], 2015, C COMP VIS PATT REC
  • [5] [Anonymous], ARXIVCSDC180707928
  • [6] [Anonymous], PROC CVPR IEEE
  • [7] [Anonymous], C COMP VIS PATT REC
  • [8] [Anonymous], 2017, INT S HIGH PERF COMP
  • [9] [Anonymous], 2009, HP laboratories
  • [10] [Anonymous], INT C LEARN ING REPR