Dynamic Deduplication Decision in a Hadoop Distributed File System

被引：2

作者：

Chang, Ruay-Shiung ^{[1
]}

Liao, Chih-Shan ^{[1
]}

Fan, Kuo-Zheng ^{[1
]}

Wu, Chia-Ming ^{[1
]}

机构：

[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Hualien 974, Taiwan

来源：

INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS | 2014年

关键词：

CODES;

D O I：

10.1155/2014/630380

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.

引用

页数：14