Accelerating Divergent Applications on SIMD Architectures Using Neural Networks

被引:14
作者
Grigorian, Beayna [1 ]
Reinman, Glenn [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
Design; Performance;
D O I
10.1145/2717311
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of this research is to find a neural-network-based solution to the well-known problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our approach differs from existing techniques that handle branch (or control-flow) divergence, which use costly hardware modifications, low-utilization masking techniques, or static prediction methods. As we examine divergent applications, we characterize the degree of data-dependent control flow seen in each and isolate the code regions (or "kernels") that cause the most performance degradation due to branch divergence. We then train neural networks (NNs) offline to approximate these kernels and inject the NN computations directly into the applications as substitutes for the kernels they approximate. This essentially translates control flow into nondivergent computation, trading off precision for performance. As our methodology manipulates application source code directly, it is inherently platform agnostic and can be adopted as a general means for accelerating divergent applications on data-parallel architectures. In this article, we present the Neuralizer, an automated software flow for kernel identification, NN training, and NN integration, as well as supplementary user-controlled optimization techniques. Evaluating our approach on a variety of divergent applications run on a Graphics Processing Unit (GPU), we on average achieve performance gains of 13.6x and energy savings of 14.8x with 96% accuracy.
引用
收藏
页数:23
相关论文
共 46 条
[1]  
[Anonymous], 1970, Classics in Applied Mathematics
[2]  
[Anonymous], P 2 ANN ASCI C
[3]   Green: A Framework for Supporting Energy-Conscious Programming using Controlled Approximation [J].
Baek, Woongki ;
Chilimbi, Trishul M. .
ACM SIGPLAN NOTICES, 2010, 45 (06) :198-209
[4]   The PARSEC Benchmark Suite: Characterization and Architectural Implications [J].
Bienia, Christian ;
Kumar, Sanjeev ;
Singh, Jaswinder Pal ;
Li, Kai .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :72-81
[5]  
Carbin M, 2013, ACM SIGPLAN NOTICES, V48, P33, DOI [10.1145/2544173.2509546, 10.1145/2509136.2509546]
[6]  
Chaudhuri Swarat, 2011, P 19 ACM SIGSOFT S 1, P102, DOI 10.1145/2025113.2025131
[7]   A performance study of general-purpose applications on graphics processors using CUDA [J].
Che, Shuai ;
Boyer, Michael ;
Meng, Jiayuan ;
Tarjan, David ;
Sheaffer, Jeremy W. ;
Skadron, Kevin .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (10) :1370-1380
[8]   ERSA: Error Resilient System Architecture for Probabilistic Applications [J].
Cho, Hyungmin ;
Leem, Larkhoon ;
Mitra, Subhasish .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (04) :546-558
[9]  
Cong J., 2012, P 2012 ACMIEEE INT S, P379, DOI DOI 10.1145/2333660.2333747
[10]  
Cong J, 2012, DES AUT CON, P843