DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

被引:44
作者
Hill, Parker [1 ]
Jain, Animesh [1 ]
Hill, Mason [1 ,2 ]
Zamirai, Babak [1 ]
Hsu, Chang-Hong [1 ]
Laurenzano, Michael A. [1 ]
Mahlke, Scott [1 ]
Tang, Lingjia [1 ]
Mars, Jason [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Nevada, Las Vegas, NV 89154 USA
来源
50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) | 2017年
基金
美国国家科学基金会;
关键词
GPU Architecture; Deep Neural Networks; Memory Bandwidth; Performance Optimization;
D O I
10.1145/3123939.3123970
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled datacenters and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-theart DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1x on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6x.
引用
收藏
页码:786 / 799
页数:14
相关论文
共 64 条
[1]  
[Anonymous], 2015, NATURE
[2]  
[Anonymous], 2015, Tensorflow
[3]  
[Anonymous], 2015, Microsoft Research Whitepaper
[4]  
[Anonymous], INT S COMP ARCH ISCA
[5]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[6]  
[Anonymous], INT S COMP ARCH ISCA
[7]  
[Anonymous], 2011, CVPR 2011 WORKSH
[8]  
[Anonymous], INT S COMP ARCH ISCA
[9]  
[Anonymous], 2007, IEEE INT C ICML
[10]  
[Anonymous], 2017, cuSPARSE