MGRM: A Multi-Segment Greedy Rewriting Method to Alleviate Data Fragmentation in Deduplication-Based Cloud Backup Systems

被引:3
|
作者
Zhang, Datong [1 ]
Deng, Yuhui [1 ,2 ]
Zhou, Yi [3 ]
Li, Jie [1 ]
Zhu, Weiheng [1 ]
Min, Geyong [4 ]
机构
[1] Jinan Univ, Dept Comp Sci, Guangzhou 510632, Peoples R China
[2] Wuhan Natl Lab Optoelect, Wuhan 430079, Peoples R China
[3] Columbus State Univ, TSYS Sch Comp Sci, Columbus, GA 31907 USA
[4] Univ Exeter, Coll Engn Math & Phys Sci, Dept Comp Sci, Exeter EX44QF, England
基金
中国国家自然科学基金;
关键词
Containers; Sorting; Cloud computing; Costs; Redundancy; Mathematical models; Lead; Cloud; data deduplication; fragmentation; rewriting algorithm; data restore;
D O I
10.1109/TCC.2022.3214816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication has been broadly used in Cloud due to its storage space saving ability. Capping methods that rewrite the data chunks of low Container Reference Ratio (CRR) containers are developed to alleviate the data fragmentation in Cloud. We analyze and observe from real traces that a number of segments only point to low CRR containers, while some others only contain high CRR containers. This interesting observation is ignored by the existing capping methods. To address this problem, we propose a multi-segment greedy rewriting method named MGRM. MGRM sorts containers of segments in a sequential way. More specifically, given the $i$ith segment currently being processed, MGRM will sort all the containers in the top $i$ith segments. This salient searching feature enables MGRM to select and rewrite the true low-reference container set. Moreover, to achieve a good balance between deduplication ratio and restore performance, MGRM has two working modes: an optimal rewriting mode and a radical rewriting mode. When working in the optimal rewriting mode, MGRM aims to improve the deduplication ratio; when the radical rewriting mode, MGRM strives to improve the restore performance. MGRM adaptively switches the working mode according to workload. Furthermore, unlike the existing capping methods that improve restore performance at the cost of the deduplication ratio, MGRM pays attention to both aspects. Our extensive experimental results show that MGRM achieves high restore performance, coupled with a high deduplication ratio. In particular, compared with the two state-of-art schemes FC and FLC, MGRM improves the deduplication ratio and restore performance by up to 114.83% and 99.34%, respectively.
引用
收藏
页码:2503 / 2516
页数:14
相关论文
共 1 条
  • [1] DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems
    Tan, Chao
    Li, Luyu
    Wu, Chentao
    Li, Jie
    NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 58 - 70