Compression acceleration using GPGPU

被引：0

作者：

Shastry, Krishnaprasad ^{[1
]}

Pandey, Avinash ^{[1
]}

Agrawal, Ashutosh ^{[1
]}

Sarveswara, Ravi ^{[1
]}

机构：

[1] Hewlett Packard Enterprise, Palo Alto, CA 94304 USA

来源：

2016 23RD IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING WORKSHOPS (HIPCW 2016) | 2016年

关键词：

compression; bzip2; acceleration; gpu; burrows-wheeler-transform; suffix-array;

D O I：

10.1109/HiPCW.2016.23

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The CPU has become a bottleneck for bigdata solutions that have to deal with increasingly large and varied data. Two approaches for dealing with this bottleneck have generated considerable interest amongst researchers in the recent past. The first approach is the building of heterogeneous systems that offload a part of the computational workload to co-processors. The offload of a part of the computation to a co-processor causes CPU cycles to be freed up. These freed-up CPU cycles can then be used for running other tasks. Offloading also allows the building of accelerated solutions that run significantly faster. Thus, offloading allows the processing of larger workloads by the system. The second approach that has generated interest is the compression of data. The storage and network latencies for bigdata solutions are reduced significantly when data compression techniques are employed. The associated bandwidth requirements too are reduced, as are the hardware requirements. However, a marriage of the two approaches - compression on a co-processor - has remained a challenge due to the inherent characteristics of the compression algorithms that have made them poor choices for implementation on co-processors such as GPGPUs. In this paper, an implementation of the popular bzip2 compression algorithm that utilizes a GPGPU for a part of its computing is presented. A marginal improvement of 1.4x in performance when compared to the standard CPU-based algorithm is delivered by this implementation, while freeing up CPU cycles by a significant 50% at the same time. This speedup is better than any previously published work. The challenges in getting a good performance for bzip2 on a GPU are discussed.

引用

页码：70 / 78

页数：9

共 17 条

[1]

[Anonymous], 1994, 124 SRC DIG EQ CORP

[2]

Bentley J. L., 1986, COMMUN ACM, V29, P262

[3]

Cloud R. L., ACCELERATING L UNPUB

[4]

Deshpande Aditya, 2014, IIITTR2014

[5]

Farach M., P 38 ANN S FDN COMP, P137

[6]

Guo GuiXin, 2013, IEEE INT C BIGD

[7]

Harada Takahiro, INTRO GPU RADI UNPUB

[8] A METHOD FOR THE CONSTRUCTION OF MINIMUM-REDUNDANCY CODES [J].

HUFFMAN, DA .

PROCEEDINGS OF THE INSTITUTE OF RADIO ENGINEERS, 1952, 40 (09) :1098-1101

[9]

Karp Richard M., P 4 ANN ACM S THEOR, P125

[10] SUFFIX ARRAYS - A NEW METHOD FOR ONLINE STRING SEARCHES [J].

MANBER, U ;

MYERS, G .

SIAM JOURNAL ON COMPUTING, 1993, 22 (05) :935-948

← 1 2 →