CareDedup: Cache-aware Deduplication for Reading Performance Optimization in Primary Storage

被引:37
作者
Lin, Bin [1 ]
Li, Shanshan [1 ]
Liao, Xiangke [1 ]
Liu, Xiaodong [1 ]
Zhang, Jing [2 ]
Jia, Zhouyang [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China
[2] Sci & Technol Complex Aviat Syst Simulat Lab, Beijing, Peoples R China
来源
2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016) | 2016年
关键词
Deduplication; Fragmentation; Cache;
D O I
10.1109/DSC.2016.56
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deduplication technology has been increasingly used to reduce the primary storage cost. In practice, it often causes additional on-disk fragmentation that impairs the reading performance. Existing deduplication algorithms mainly focus on the static data layout design so that the random I/O requests are largely avoided and the harmful effect can be alleviated. However, our trace-driven emulations show that, deduplication does not always impair the reading. It offers unique new opportunities for reading performance optimization by more possible cache hits. Motivated by this, we propose a novel cache-aware deduplication scheme CareDedup to well leverage the new opportunities. Based on a uniform locality assessment algorithm design, CareDedup selects the most profitable duplicated blocks to deduplicate for maximizing the reading performance. Our experimental evaluation using real-world traces shows that compared with the sequence-based deduplication algorithms, the duplicate elimination ratio and the reading performance (latency) can be both improved simultaneously. Given a desired duplicate elimination ratio, CareDedup can consistently outperforms sequence-based method by further reducing the reading latency by 2-5%.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 41 条
[1]  
Aliaga DG, 2007, IEEE I CONF COMP VIS, P3004
[2]  
[Anonymous], 2010, P USENIX C FIL STOR
[3]  
[Anonymous], 2012, P 10 USENIX C FIL ST
[4]  
[Anonymous], 2012, P USENIX ANN TECH C
[5]  
[Anonymous], 2004, P ANN C USENIX ANN T
[6]  
[Anonymous], 2012, PROC IEEE 28 S MASS
[7]  
[Anonymous], 2009, 7 USENIX C FIL STOR
[8]  
Aronovich L., 2009, SYSTOR 09, P6
[9]  
Bhagwat D., 2009, P 17 IEEE INT S MOD
[10]  
Debnath Biplob., 2010, P USENIX ANN TECHNIC, P16