PFP: Improving the Reliability of Deduplication-based Storage Systems with Per-File Parity

被引:10
作者
Wu, Suzhen [1 ]
Mao, Bo [2 ]
Jiang, Hong [3 ]
Luan, Huagao [1 ]
Zhou, Jindong [2 ]
机构
[1] Xiamen Univ, Dept Comp Sci, Xiamen 361005, Fujian, Peoples R China
[2] Xiamen Univ, Software Sch, Xiamen 361005, Fujian, Peoples R China
[3] Univ Texas Arlington, Comp Sci & Engn Dept, Arlington, TX 76019 USA
基金
中国国家自然科学基金;
关键词
Data deduplication; reliability; per-file parity; intra-file recovery; inter-file recovery;
D O I
10.1109/TPDS.2019.2898942
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data deduplication weakens the reliability of storage systems since by design it removes duplicate data chunks common to different files and forces these files to share a single physical date chunk, or critical chunk, after deduplication. Thus, the loss of a single such critical data chunk can potentially render all referencing (sharing) files unavailable. However, the reliability issue in deduplication-based storage systems has not received adequate attention. Existing approaches introduce data redundancy after files have been deduplicated, either by replication on critical data chunks, i.e., chunks with high reference count, or RAID schemes on unique data chunks, which means that these schemes are based on individual unique data chunks rather than individual files. This can leave individual files vulnerable to losses, particularly in the presence of transient and unrecoverable data chunk errors such as latent sector errors. To address this file reliability issue, this paper proposes a Per-File Parity (short for PFP) scheme to improve the reliability of deduplication-based storage systems. PFP computes the XOR parity within parity groups of data chunks of each file after the chunking process but before the data chunks are deduplicated. Therefore, PFP can provide parity redundancy protection for all files by intra-file recovery and a higher-level protection for data chunks with high reference counts by inter-file recovery. Our reliability analysis and extensive data-driven, failure-injection based experiments conducted on a prototype implementation of PFP show that PFP significantly outperforms the existing redundancy solutions, DTR and RCR, in system reliability, tolerating multiple data chunk failures and guaranteeing file availability upon multiple data chunk failures. Moreover, a performance evaluation shows that PFP only incurs an average of 5.7 percent performance degradation to the deduplication-based storage system.
引用
收藏
页码:2117 / 2129
页数:13
相关论文
共 40 条
[1]  
[Anonymous], CLOUD BIG DATA MASSI
[2]  
[Anonymous], 2010, P 8 USENIX S FIL STO
[3]  
[Anonymous], 2012, P 10 USENIX C FIL ST
[4]  
[Anonymous], VMWARE VIRTUAL APPLI
[5]  
[Anonymous], P IEEE IFIP INT C DE
[6]  
[Anonymous], 2007, P 5 USENIX C FIL STO
[7]  
[Anonymous], 2016, PROC 8 USENIX WORKSH
[8]  
[Anonymous], DEDUPLICATION BASED
[9]  
[Anonymous], FIREFOX INSTALL IMAG
[10]  
Bairavasundaram LN, 2007, PERF E R SI, V35, P289