A Dynamic Deduplication Method with Application-Aware in Cloud Environment

被引:0
作者
He Q. [1 ]
Bian G. [1 ]
Shao B. [2 ]
Jia L. [1 ]
机构
[1] School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an
[2] School of Management, Xi'an University of Architecture and Technology, Xi'an
来源
Bian, Genqing | 2018年 / Xi'an Jiaotong University卷 / 52期
关键词
Cache; Cloud storage; Offline deduplication; Online deduplication;
D O I
10.7652/xjtuxb201810004
中图分类号
学科分类号
摘要
A hybrid deduplication method (Hy-Dedup) is adopted to solve the problem that the deduplication efficiency in the cloud storage system is not high for traditional mode online/offline deduplication, and the method performs effective data deduplication by combining online and offline modes. This method clusters fingerprint indices according the type of loads in online deduplication stage by adopting the fingerprint caching technology. The temporal local consistency of the duplicated data in data stream is estimated and the spatial local consistency is evaluated by setting different deduplication thresholds to reduce the disk fragments. The problem that the cache cannot be hit because lack of local consistency in the offline deduplication phase will be solved. The duplicated data is significantly reduced by this method while maintaining the I/O performance and the system throughput. Experimental results and a comparison with iDedup show that Hy-Dedup improves the online deduplication ratio by up to 35.9% and the disk capacity requirement reduces by 41.36%. It is concluded that the proposed method can achieve high-deciding deduplication in the cloud storage system, improve deduplication efficiency, and save storage space. © 2018, Editorial Office of Journal of Xi'an Jiaotong University. All right reserved.
引用
收藏
页码:24 / 30
页数:6
相关论文
共 18 条
  • [1] Wang L., Zhang X., Zhu G., Et al., A grouping prediction method based on undirected graph traversal in de-duplication system, Journal of Xi'an Jiaotong University, 47, 10, pp. 51-56, (2013)
  • [2] Nisha T.R., Abirami S., Manohar E., Experimental study on chunking algorithms of data deduplication system on large scale data, Proceedings of the International Conference on Soft Computing Systems, Advances in Intelligent Systems and Computing, pp. 91-98, (2016)
  • [3] Fu Y., Xiao N., Liu F., Et al., Deduplication based storage optimization technique for virtual desktop, Journal of Computer Research and Development, 49, pp. 125-130, (2012)
  • [4] Sudhakaran S., Treesa M., A survey on data deduplication in large scale data, International Journal of Computer Applications, 165, 1, pp. 1-4, (2017)
  • [5] Xia W., Jiang H., Feng D., Et al., Similarity and locality based indexing for high performance data deduplication, IEEE Transactions on Computers, 64, 4, pp. 1162-1176, (2015)
  • [6] Srinivasan K., Bisson T., Goodson G., Et al., iDedup: latency-aware, inline data deduplication for primary storage, Proceedings of the USENIX Conference on File and Storage Technologies, (2012)
  • [7] Mao B., Jiang H., Wu S., Et al., POD: performance oriented I/O deduplication for primary storage systems in the cloud, Proceedings of the 2014 IEEE International Parallel and Distributed Processing Symposium, pp. 767-776, (2014)
  • [8] Kaiser J., Suss T., Nagel L., Et al., Sorted deduplication: how to process thousands of backup streams, Proceedings of the 201632th Symposium on Mass Storage Systems and Technologies, (2016)
  • [9] Fu Y., Jiang H., Xiao N., Et al., AA-dedupe: an application-aware source deduplication approach for cloud backup services in the personal computing environment, Proceedings of the IEEE International Conference on Cluster Computing, pp. 112-120, (2011)
  • [10] Li W., Jean-Baptise G., Riveros J., Et al., Cachededup: in-line deduplication for flash caching, Proceedings of the 14th USENIX Conference on File and Storage Technologies, pp. 301-314, (2016)