Hadoop Based Scalable Cluster Deduplication for Big Data

被引:4
作者
Liu, Qing [1 ]
Fu, Yinjin [1 ]
Ni, Guiqiang [1 ]
Hou, Rui [2 ]
机构
[1] PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China
[2] Inst Elect Syst Engn, Beijing, Peoples R China
来源
2016 IEEE 36TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2016) | 2016年
关键词
data deduplication; big data; Hadoop; HBase; index management;
D O I
10.1109/ICDCSW.2016.17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The exponential growth of data has brought a tremendous challenge on the storage system in data center. Data deduplication technology which detects and eliminates redundant data in the dataset can greatly reduce the quantity of data and optimize the utilization of storage space. This paper presented a scalable and reliable cluster deduplication system Halodedu over the Hadoop-based cloud computing platform. Halodedu used MapReduce and HDFS to realize parallel deduplication processing and manage data storage, respectively. Intra-node local database was used to build up a fast and distributed chunk fingerprint index management. In order to maintain the availability and reliability of metadata, HBase was utilized to store the metadata of backup files. We further used virtual machine images as input dataset to evaluate Halodedu. The comparative experiments demonstrated that Halodedu has improvements on deduplication speed and system scalability.
引用
收藏
页码:98 / 105
页数:8
相关论文
共 17 条
  • [11] Kathpal Atish., 2011, HiPC
  • [12] Dedoop: Efficient Deduplication with Hadoop
    Kolb, Lars
    Thor, Andreas
    Rahm, Erhard
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1878 - 1881
  • [13] Lillibridge M., 2009, Fast, V9, P111
  • [14] Meister Dirk, 2012, IEEE P INT C HIGH PE, P1, DOI 10.1109/SC.2012.14
  • [15] A novel approach to data deduplication over the engineering-oriented cloud systems
    Sun, Zhe
    Shen, Jun
    Yong, Jianming
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2013, 20 (01) : 45 - 57
  • [16] Xia Wen., 2011, Proceedings of the USENIX Annual Technical Conference, USENIXATC'11, P26
  • [17] Zhu B, 2008, PROCEEDINGS OF THE 6TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '08), P269