An enhancement of data locality in Hadoop distributed file system

被引:0
作者
Reddy, A. Siva Krishna [1 ]
Sujatha, Pothula [1 ]
Koti, Prasad [2 ]
Dhavachelvan, P. [1 ]
Amudhavel, J. [3 ]
机构
[1] Pondicherry Univ, Dept Comp Sci, Pondicherry, India
[2] Saradha Gangadaran Coll, Dept Comp Sci, Pondicherry, India
[3] KL Univ, Dept CSE, Guntur, Andhra Pradesh, India
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2018年 / 11卷 / 01期
关键词
DATA PLACEMENT; DISK CONSUMPTION; HADOOP; MAPREDUCE; STORAGE COST;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The MapReduce system has greater prevalence because of its advantages, for example, programming simplicity, fault tolerance and data distribution. The number of utilizations based on Hadoop is growing because of its robustness and features. Data locality is a critical issue in parallel data applications where the task processing is spending a various amount of time and resource at particular locations. Some methodologies have been proposed to enhance the data locality. In this paper, we identify the DP problem across nodes and improve the data locality. At first, MapReduce system divides the dataset into smaller subsets called data blocks. These data blocks are encoded with erasure coding to achieve the reliability. Then, the Flexible Data Placement (FDP) algorithm applies to the slave nodes (data nodes) which dynamically dispatches the data blocks based on their locality. It will reduce the collision of vulnerability, network traffic and increases the throughput of the Hadoop system. With the help of analytical model, execution time of every task is identified which detects the job with data locality problem. Then, the hash table is built for data blocks to the node. In data locality, a program is transferred to the node where the original data placed. Experiments are conducted on two real-world data sets with different data placement approaches, which show that the proposed methodology diminishes the execution time and upgrades the performance of 42.5%, which is the better performance than the existing methods.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
[31]   A Distributed and Cooperative NameNode Cluster for a Highly-Available Hadoop Distributed File System [J].
Kim, Yonghwan ;
Araragi, Tadashi ;
Nakamura, Junya ;
Masuzawa, Toshimitsu .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (04) :835-851
[32]   Optimizing Read Operations of Hadoop Distributed File System on Heterogeneous Storages [J].
Lee, Jongbaeg ;
Lee, Jongwuk ;
Lee, Sang-Won .
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (03) :709-729
[33]   SD-HDFS: Secure Deletion in Hadoop Distributed File System [J].
Agrawal, Bikash ;
Hansen, Raymond ;
Rong, Chunming ;
Wiktorski, Tomasz .
2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, :181-189
[34]   Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective [J].
Kapil, Gayatri ;
Agrawal, Alka ;
Attaallah, Abdulaziz ;
Algarni, Abdullah ;
Kumar, Rajeev ;
Khan, Raees Ahmad .
PEERJ COMPUTER SCIENCE, 2020, 2020 (02) :1-31
[35]   OPTIMIZING HADOOP DATA LOCALITY: PERFORMANCE ENHANCEMENT STRATEGIES IN HETEROGENEOUS COMPUTING ENVIRONMENTS [J].
Kim, Si-Yeong ;
Kim, Tai-Hoon .
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (06) :4558-4575
[36]   A NOVEL APPROACH FOR REPLICA SYNCHRONIZATION IN HADOOP DISTRIBUTED FILE SYSTEMS [J].
Vini, Miss. J. ;
Nallathamby, Rachel ;
Robin, C. R. Rene .
BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 :590-595
[37]   Scheduling in Big Data Heterogeneous Distributed System Using Hadoop [J].
Thakkar, Shraddha ;
Patel, Sanjay .
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 :119-131
[38]   Sandbox security model for Hadoop file system [J].
Begum, Gousiya ;
Ul Huq, S. Zahoor ;
Kumar, A. P. Siva .
JOURNAL OF BIG DATA, 2020, 7 (01)
[39]   Sandbox security model for Hadoop file system [J].
S. Zahoor Ul Gousiya Begum ;
A. P. Siva Huq .
Journal of Big Data, 7
[40]   Hadoop Performance Analysis Model with Deep Data Locality [J].
Lee, Sungchul ;
Jo, Ju-Yeon ;
Kim, Yoohwan .
INFORMATION, 2019, 10 (07)