User-aware de-duplication algorithm

被引:0
作者
School of Computer, Wuhan University, Wuhan [1 ]
430072, China
不详 [2 ]
518219, China
不详 [3 ]
410000, China
机构
[1] School of Computer, Wuhan University, Wuhan
[2] Standard & Patent Department, Huawei Technologies Co., Ltd, Shenzhen
[3] State Grid Information & Communication Company of Hu'nan Province, Changsha
来源
Ruan Jian Xue Bao | / 10卷 / 2581-2595期
关键词
Cloud computing; Data deduplication; Data locality; I/O performance bottleneck; Virtual desktop instrument;
D O I
10.13328/j.cnki.jos.004795
中图分类号
学科分类号
摘要
By doing a lot of experiments, if two users have more cross-project then they will own more duplication data at a virtual desktop instrument system. So, according to this finding, this paper proposes a user-aware de-duplication algorithm. This algorithm breaks the rule of data locality and can work at the new rule of user locality. According to the new rule, it just need load one user's finger print data into memory for each user group. So it can reduce 5x~10x memory requirements than other algorithm and it can control the searching scope in a limited number for each checking besides. So this algorithm can avoid a lot of read I/O operations. Meanwhile, this algorithm can adjust the searching scope dynamically according to the current workload of VDI system. Because it always tries to get the best de-duplication rate but not affect the response time of VDI system. The prototype experimental results show that it can improve the performance of de-duplication algorithm, especially when it used in a massive data storage system. Compared with OpenDedup, the algorithm can reduce more than 200% read I/O operations and can accelerate the response time more than 3x fast when the finger print data is bigger than available memory. © Copyright 2015, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2581 / 2595
页数:14
相关论文
共 24 条
  • [1] Fu Y.J., Xiao N., Liu F., Bao X.Q., Deduplication based storage optimization technique for virtual desktop, Journal of Computer Research and Development, 49, pp. 125-130, (2012)
  • [2] Bolosky W.J., Corbin S., Goebel D., Douceur J.R., Single instance storage in Windows 2000, Proc. of the 4th Conf. on USENIX Windows Systems Symp, pp. 13-24, (2000)
  • [3] Quinlan S., Dorward S., Venti: A new approach to archival storage, Proc. of the 1st USENIX Conf. on File and Storage Technologies (FAST 2002), (2002)
  • [4] Muthitacharoen A., Chen B., Mazieres D., A low-bandwidth network file system, Proc. of the ACM SOSP 2001, pp. 174-187, (2001)
  • [5] Dubnicki C., Gryz L., Held T., Kaczmarczyk M., Kilian W., Strzelczak P., Hydrastor: A scalable secondary storage, Proc. of the USENIX FAST 2009, pp. 197-210, (2009)
  • [6] Ungureanu C., Atkin B., Aranya A., Gokhale S., Rago S., Calkowski G., Dubnicki G., Bohra A., HydraFS: A high-throughput file system for the HYDRAstor content-addressable storage system, Proc. of the USENIX FAST 2010, pp. 165-188, (2010)
  • [7] Ao L., Shu J.W., Li M.Q., Data deduplication techniques, Ruan Jian Xue Bao/Journal of Software, 21, 5, pp. 916-929, (2010)
  • [8] Fu Y.J., Xiao N., Liu F., Research and development on key techniques of data deduplication, Journal of Computer Research and Development, 49, 1, pp. 12-20, (2012)
  • [9] Ng C.H., Patrick P., RevDedup: A reverse deduplication storage system optimized for reads to latest backups, Proc. of the 4th ACM SIGOPS Asia-Pacific Workshop on Systems, (2013)
  • [10] Guo F., Efstathopoulos P., Building a high-performance deduplication system, Proc. of the 2011 USENIX Annual Technical Conf. (USENIX 2011), pp. 331-345, (2011)