HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs

被引:3
作者
Yang, Yi [1 ]
Feng, Min [1 ]
Chakradhar, Srimat [1 ]
机构
[1] NEC Labs Amer, Dept Integrated Syst, Princeton, NJ 08540 USA
来源
PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016 | 2016年
关键词
GPGPU; deep learning; CNN;
D O I
10.1109/ICPP.2016.73
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The massively parallel computation capability has made GPGPUs a promising platform for convolutional neural networks (CNNs). In this paper, we present HppCnn, a CNN library achieves both the high performance and portability on GPGPUs. In HppCnn, we propose a novel three-step approach to implement convolutional kernels using Nvidia cuBLAS efficiently. To overcome limitations of our three-step approach, we improve cuBLAS by enabling nested parallelism, and implement a low-cost auto-tuning module to leveraging existing libraries in the runtime. The experiments show HppCnn achieves significant speedups over both other cuBLAS-based and hand-optimized solutions. The results also show our solution delivers near-optimal performance on GPUs with the portability.
引用
收藏
页码:582 / 587
页数:6
相关论文
共 21 条
[1]  
[Anonymous], ASPLOS
[2]  
[Anonymous], 2012, LARGE SCALE VISUAL R
[3]  
[Anonymous], 2012, NVIDIA Kepler GK110 Whitepaper
[4]  
[Anonymous], BREWING IMAGENET
[5]  
[Anonymous], ISCA
[6]  
[Anonymous], CIFAR LEARNING MULTI
[7]  
[Anonymous], ASS NVID MAXW ARCH
[8]  
[Anonymous], CUDA CONVNET1
[9]  
[Anonymous], 2006, 10 INT WORKSHOP FRON
[10]  
[Anonymous], NVIDIA FERMI NVIDIAS