Data Reuse for Accelerated Approximate Warps

被引:2
|
作者
Peroni, Daniel [1 ]
Imani, Mohsen [1 ]
Nejatollahi, Hamid [2 ]
Dutt, Nikil [2 ]
Rosing, Tajana [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
关键词
Approximate computing; energy efficiency; floating-point unit (FPU); GPU; warps; NEURAL-NETWORK; MANAGEMENT; MEMORY;
D O I
10.1109/TCAD.2020.2986128
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many data-driven applications, including computer vision, machine learning, speech recognition, and medical diagnostics show tolerance to computation error. These applications are often accelerated on GPUs, but the performance improvements require high energy usage. In this article, we present DRAAW, an approximate computing technique capable of accelerating GPGPU applications at a warp level. In GPUs, warps are groups of threads which issued together across multiple cores. The slowest thread dictates the pace of the warp, so DRAAW identifies these bottlenecks and avoids them during approximation. We alleviate computation costs by using an approximate lookup table which tracks recent operations and reuses them to exploit temporal locality within applications. To improve neural network performance, we propose neuron aware approximation, a technique which profiles operations within network layers and automatically configures DRAAW to ensure computations with more impact on the output accuracy are subject to less approximation. We evaluate our design by placing DRAAW within each core of an Nvidia Kepler Architecture Titan. DRAAW improves throughput by up to 2.8x and improves energy-delay product (EDP) by 5.6x for six GPGPU applications while maintaining less than 5% output error. We show neuron aware approximation accelerates the inference of six neutral networks by 2.9x and improves EDP by 6.2x with less than 1% impact on prediction accuracy.
引用
收藏
页码:4623 / 4634
页数:12
相关论文
empty
未找到相关数据