An enhancement of data locality in Hadoop distributed file system

被引:0
作者
Reddy, A. Siva Krishna [1 ]
Sujatha, Pothula [1 ]
Koti, Prasad [2 ]
Dhavachelvan, P. [1 ]
Amudhavel, J. [3 ]
机构
[1] Pondicherry Univ, Dept Comp Sci, Pondicherry, India
[2] Saradha Gangadaran Coll, Dept Comp Sci, Pondicherry, India
[3] KL Univ, Dept CSE, Guntur, Andhra Pradesh, India
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2018年 / 11卷 / 01期
关键词
DATA PLACEMENT; DISK CONSUMPTION; HADOOP; MAPREDUCE; STORAGE COST;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The MapReduce system has greater prevalence because of its advantages, for example, programming simplicity, fault tolerance and data distribution. The number of utilizations based on Hadoop is growing because of its robustness and features. Data locality is a critical issue in parallel data applications where the task processing is spending a various amount of time and resource at particular locations. Some methodologies have been proposed to enhance the data locality. In this paper, we identify the DP problem across nodes and improve the data locality. At first, MapReduce system divides the dataset into smaller subsets called data blocks. These data blocks are encoded with erasure coding to achieve the reliability. Then, the Flexible Data Placement (FDP) algorithm applies to the slave nodes (data nodes) which dynamically dispatches the data blocks based on their locality. It will reduce the collision of vulnerability, network traffic and increases the throughput of the Hadoop system. With the help of analytical model, execution time of every task is identified which detects the job with data locality problem. Then, the hash table is built for data blocks to the node. In data locality, a program is transferred to the node where the original data placed. Experiments are conducted on two real-world data sets with different data placement approaches, which show that the proposed methodology diminishes the execution time and upgrades the performance of 42.5%, which is the better performance than the existing methods.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
  • [21] Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System
    Rodrigues, Anisha P.
    Fernandes, Roshan
    Vijaya, P.
    Chander, Satish
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2021, 20 (04)
  • [22] A Review on Data locality in Hadoop MapReduce
    Sharma, Anil
    Singh, Gurwinder
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 723 - 728
  • [23] Customized Web User Interface for Hadoop Distributed File System
    Krishna, T. Lakshmi Siva Rama
    Ragunathan, T.
    Battula, Sudheer Kumar
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 2, 2016, 380 : 567 - 576
  • [24] Research of Cloud Storage Based on Hadoop Distributed File System
    Han, Yongqi
    Zhang, Yun
    Yu, Shui
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 2472 - 2475
  • [25] Modeling and Simulation of Hadoop Distributed File System in a Cluster of Workstations
    Aguilera-Mendoza, Longendri
    Llorente-Quesada, Monica T.
    MODEL AND DATA ENGINEERING, MEDI 2013, 2013, 8216 : 1 - 12
  • [26] Towards a Better Replica Management for Hadoop Distributed File System
    Ciritoglu, Hilmi Egemen
    Saber, Takfarinas
    Buda, Teodora Sandra
    Murphy, John
    Thorpe, Christina
    2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 104 - 111
  • [27] A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System
    Kim, Yonghwan
    Araragi, Tadashi
    Nakamura, Junya
    Masuzawa, Toshimitsu
    2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2014, : 333 - 334
  • [28] Forensic Investigation Using RAM Analysis on the Hadoop Distributed File System
    Laing, Stuart
    Ludwiniak, Robert
    El Boudani, Brahim
    Chrysoulas, Christos
    Ubakanma, George
    Pitropakis, Nikolaos
    2023 19TH INTERNATIONAL CONFERENCE ON THE DESIGN OF RELIABLE COMMUNICATION NETWORKS, DRCN, 2023,
  • [29] Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System
    Kim, Jongyeop
    Kumar, Ashwin T. K.
    George, K. M.
    Park, Nohpill
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 62 - 68
  • [30] A Distributed and Cooperative NameNode Cluster for a Highly-Available Hadoop Distributed File System
    Kim, Yonghwan
    Araragi, Tadashi
    Nakamura, Junya
    Masuzawa, Toshimitsu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (04) : 835 - 851