Dynamic Deduplication Decision in a Hadoop Distributed File System

被引:2
|
作者
Chang, Ruay-Shiung [1 ]
Liao, Chih-Shan [1 ]
Fan, Kuo-Zheng [1 ]
Wu, Chia-Ming [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Hualien 974, Taiwan
关键词
CODES;
D O I
10.1155/2014/630380
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] A CKAN Plugin for Data Harvesting to the Hadoop Distributed File System
    Scholz, Robert
    Tcholtchev, Nikolay
    Laemmel, Philipp
    Schieferdecker, Ina
    CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, : 19 - 28
  • [22] Formation of Single and Multinode Clusters in Hadoop Distributed File System
    Begum, A. Aasha
    Chitra, K.
    2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 162 - 164
  • [23] On the Power of In-Network Caching in the Hadoop Distributed File System
    Newberry, Eric
    Zhang, Beichuan
    PROCEEDINGS OF THE 2019 CONFERENCE ON INFORMATION-CENTRIC NETWORKING (ICN '19), 2019, : 89 - 99
  • [24] Customized Web User Interface for Hadoop Distributed File System
    Krishna, T. Lakshmi Siva Rama
    Ragunathan, T.
    Battula, Sudheer Kumar
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 2, 2016, 380 : 567 - 576
  • [25] File Deduplication with Cloud Storage File System
    Ku, Chan-I
    Luo, Guo-Heng
    Chang, Che-Pin
    Yuan, Shyan-Ming
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 280 - 287
  • [26] Complete Data Deletion Based on Hadoop Distributed File System
    Wang, Fulin
    Wu, Shunxiang
    Cai, Jianhuai
    Zhao, Longze
    Liao, Zhendong
    Ming, Daodong
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [27] A New Replica Placement Policy for Hadoop Distributed File System
    Dai, Wei
    Ibrahim, Ibrahim
    Bassiouni, Mostafa
    2016 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY), IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC), AND IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2016, : 262 - 267
  • [28] Research of Cloud Storage Based on Hadoop Distributed File System
    Han, Yongqi
    Zhang, Yun
    Yu, Shui
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 2472 - 2475
  • [29] Modeling and Simulation of Hadoop Distributed File System in a Cluster of Workstations
    Aguilera-Mendoza, Longendri
    Llorente-Quesada, Monica T.
    MODEL AND DATA ENGINEERING, MEDI 2013, 2013, 8216 : 1 - 12
  • [30] A Load-Balancing Algorithm for Hadoop Distributed File System
    Lin, Chi-Yi
    Lin, Ying-Chen
    PROCEEDINGS 2015 18TH INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS (NBIS 2015), 2015, : 173 - 179