An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

被引:8
作者
Fang, Jian [1 ,2 ]
Chen, Jianyu [2 ]
Lee, Jinho [3 ]
Al-Ars, Zaid [2 ]
Hofstee, H. Peter [2 ,4 ]
机构
[1] Natl Innovat Inst Def Technol, Beijing, Peoples R China
[2] Delft Univ Technol, Delft, Netherlands
[3] Yonsei Univ, Seoul, South Korea
[4] IBM Austin, Austin, TX USA
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2020年 / 92卷 / 09期
关键词
Decompression; FPGA; Acceleration; Snappy; CAPI;
D O I
10.1007/s11265-020-01547-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a "refine and recycle" method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write commands (for literal tokens) and read commands (for copy tokens) to a set of commands that target a single bank of block ram, and rather than performing all the dependency calculations saves logic by recycling (read) commands that return with an invalid result. A single "Snappy" decompressor implemented in reconfigurable logic leveraging this method is capable of processing multiple literal or copy tokens per cycle and achieves up to 7.2GB/s, which can keep pace with an NVMe device. The proposed method is about an order of magnitude faster and an order of magnitude more power efficient than a state-of-the-art single-core software implementation. The logic and block ram resources required by the decompressor are sufficiently low so that a set of these decompressors can be implemented on a single FPGA of reasonable size to keep up with the bandwidth provided by the most recent interface technologies.
引用
收藏
页码:931 / 947
页数:17
相关论文
共 24 条
[1]  
Adler M., 2015, pigz: A parallel implementation of gzip for modern multi-processor, multi-core machines
[2]  
Agarwal K.B, 2014, US Patent, Patent No. [8,824,569, 8824569]
[3]  
Alpha Data, 2018, ADM PCIE 9V3 US MAN
[4]  
Bartík M, 2015, IEEE I C ELECT CIRC, P179, DOI 10.1109/ICECS.2015.7440278
[5]  
Fang J, 2018, 2018 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS)
[6]   In-memory database acceleration on FPGAs: a survey [J].
Fang, Jian ;
Mulder, Yvo T. B. ;
Hidders, Jan ;
Lee, Jinho ;
Hofstee, H. Peter .
VLDB JOURNAL, 2020, 29 (01) :33-59
[7]   Refine and Recycle: A Method to Increase Decompression Parallelism [J].
Fang, Jian ;
Chen, Jianyu ;
Lee, Jinho ;
Al-Ars, Zaid ;
Hofstee, H. Peter .
2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019), 2019, :272-280
[8]   A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs [J].
Fowers, Jeremy ;
Kim, Joo-Young ;
Burger, Doug ;
Hauck, Scott .
2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, :52-59
[9]  
Gilchrist J., 2004, P 16 IASTED INT C PA, P22
[10]  
Gopal V, 2017, US Patent App, Patent No. [15/374,462, 15374462]