An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

被引：8

作者：

Fang, Jian ^{[1
,2
]}

Chen, Jianyu ^{[2
]}

Lee, Jinho ^{[3
]}

Al-Ars, Zaid ^{[2
]}

Hofstee, H. Peter ^{[2
,4
]}

机构：

[1] Natl Innovat Inst Def Technol, Beijing, Peoples R China

[2] Delft Univ Technol, Delft, Netherlands

[3] Yonsei Univ, Seoul, South Korea

[4] IBM Austin, Austin, TX USA

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2020年 / 92卷 / 09期

关键词：

Decompression; FPGA; Acceleration; Snappy; CAPI;

D O I：

10.1007/s11265-020-01547-w

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a "refine and recycle" method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write commands (for literal tokens) and read commands (for copy tokens) to a set of commands that target a single bank of block ram, and rather than performing all the dependency calculations saves logic by recycling (read) commands that return with an invalid result. A single "Snappy" decompressor implemented in reconfigurable logic leveraging this method is capable of processing multiple literal or copy tokens per cycle and achieves up to 7.2GB/s, which can keep pace with an NVMe device. The proposed method is about an order of magnitude faster and an order of magnitude more power efficient than a state-of-the-art single-core software implementation. The logic and block ram resources required by the decompressor are sufficiently low so that a set of these decompressors can be implemented on a single FPGA of reasonable size to keep up with the bandwidth provided by the most recent interface technologies.

引用

页码：931 / 947

页数：17

共 24 条

[1]

Adler M., 2015, pigz: A parallel implementation of gzip for modern multi-processor, multi-core machines

[2]

Agarwal K.B, 2014, US Patent, Patent No. [8,824,569, 8824569]

[3]

Alpha Data, 2018, ADM PCIE 9V3 US MAN

[4]

Bartík M, 2015, IEEE I C ELECT CIRC, P179, DOI 10.1109/ICECS.2015.7440278

[5]

Fang J, 2018, 2018 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS)

[6] In-memory database acceleration on FPGAs: a survey [J].

Fang, Jian ;

Mulder, Yvo T. B. ;

Hidders, Jan ;

Lee, Jinho ;

Hofstee, H. Peter .

VLDB JOURNAL, 2020, 29 (01) :33-59

[7] Refine and Recycle: A Method to Increase Decompression Parallelism [J].

Fang, Jian ;

Chen, Jianyu ;

Lee, Jinho ;

Al-Ars, Zaid ;

Hofstee, H. Peter .

2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019), 2019, :272-280

[8] A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs [J].

Fowers, Jeremy ;

Kim, Joo-Young ;

Burger, Doug ;

Hauck, Scott .

2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, :52-59

[9]

Gilchrist J., 2004, P 16 IASTED INT C PA, P22

[10]

Gopal V, 2017, US Patent App, Patent No. [15/374,462, 15374462]

← 1 2 3 →