HppCnn: A High-Performance, Portable Deep-Learning Library for GPGPUs

被引：3

作者：

Yang, Yi ^{[1
]}

Feng, Min ^{[1
]}

Chakradhar, Srimat ^{[1
]}

机构：

[1] NEC Labs Amer, Dept Integrated Syst, Princeton, NJ 08540 USA

来源：

PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016 | 2016年

关键词：

GPGPU; deep learning; CNN;

D O I：

10.1109/ICPP.2016.73

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The massively parallel computation capability has made GPGPUs a promising platform for convolutional neural networks (CNNs). In this paper, we present HppCnn, a CNN library achieves both the high performance and portability on GPGPUs. In HppCnn, we propose a novel three-step approach to implement convolutional kernels using Nvidia cuBLAS efficiently. To overcome limitations of our three-step approach, we improve cuBLAS by enabling nested parallelism, and implement a low-cost auto-tuning module to leveraging existing libraries in the runtime. The experiments show HppCnn achieves significant speedups over both other cuBLAS-based and hand-optimized solutions. The results also show our solution delivers near-optimal performance on GPUs with the portability.

引用

页码：582 / 587

页数：6