Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data

被引：7

作者：

Zou, Xiangyu ^{[1
,2
]}

Lu, Tao ^{[5
]}

Xia, Wen ^{[1
,3
,4
]}

Wang, Xuan ^{[1
,2
]}

Zhang, Weizhe ^{[1
,2
]}

Zhang, Haijun ^{[1
,2
]}

Di, Sheng ^{[6
]}

Tao, Dingwen ^{[7
]}

Cappello, Franck ^{[6
]}

机构：

[1] Harbin Inst Technol, Shenzhen 518055, Guangdong, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518055, Guangdong, Peoples R China

[3] Peng Cheng Lab, Cyberspace Secur Res Ctr, Shenzhen 518055, Guangdong, Peoples R China

[4] Wuhan Natl Lab Optoelect, Wuhan 430074, Hubei, Peoples R China

[5] Marvell Technol Grp, Santa Clara, CA 95054 USA

[6] Argonne Natl Lab, Lemont, IL 60439 USA

[7] Univ Alabama, Tuscaloosa, AL 35487 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2020年 / 31卷 / 07期

基金：

美国国家科学基金会; 国家重点研发计划;

关键词：

Lossy compression; high-performance computing; scientific data; compression rate;

D O I：

10.1109/TPDS.2020.2972548

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Scientific simulations in high-performance computing (HPC) environments generate vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for postanalysis. Unlike traditional data reduction schemes such as deduplication or lossless compression, not only can error-controlled lossy compression significantly reduce the data size but it also holds the promise to satisfy user demand on error control. Pointwise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications with lossy compression since error control can adapt to the error bound in the dataset automatically. Pointwise relative-error-bounded compression is complicated and time consuming. In this article, we develop efficient precomputation-based mechanisms based on the SZ lossy compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error-bounded compression with excellent compression ratios. In addition, we reduce traversing operations for Huffman decoding, significantly accelerating the decompression process in SZ. Experiments with eight well-known real-world scientific simulation datasets show that our solution can improve the compression and decompression rates (i.e., the speed) by about 40 and 80 p, respectively, in most of cases, making our designed lossy compression strategy the best-in-class solution in most cases.

引用

页码：1665 / 1680

页数：16

共 30 条

[1] [Anonymous], [No title captured]
[2] [Anonymous], 2012, P INT C HIGH PERF CO
[3] [Anonymous], [No title captured]
[4] ASCAC Subcommittee, 2014, TOP 10 EXASCALE RES
[5] Parallel Tensor Compression for Large-Scale Scientific Data
Austin, Woody
Ballard, Grey
Kolda, Tamara G.
[J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 912 - 922
[6] FPC: A High-Speed Compressor for Double-Precision Floating-Point Data
Burtscher, Martin
Ratanaworabhan, Paruj
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (01) : 18 - 31
[7] Efficient Lossy Compression for Scientific Data Based on Pointwise Relative Error Bound
Di, Sheng
Tao, Dingwen
Liang, Xin
Cappello, Franck
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (02) : 331 - 345
[8] Fast Error-bounded Lossy HPC Data Compression with SZ
Di, Sheng
Cappello, Franck
[J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 730 - 739
[9] ArrayUDF: User-Defined Scientific Data Analysis on Arrays
Dong, Bin
Wu, Kesheng
Byna, Surendra
Liu, Jialin
Zhao, Weijie
Rusu, Florin
[J]. HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2017, : 53 - 64
[10] Fornek T. E, 2017, ADV PHOTON SOURCE UP

← 1 2 3 →