An enhancement of data locality in Hadoop distributed file system

被引:0
作者
Reddy, A. Siva Krishna [1 ]
Sujatha, Pothula [1 ]
Koti, Prasad [2 ]
Dhavachelvan, P. [1 ]
Amudhavel, J. [3 ]
机构
[1] Pondicherry Univ, Dept Comp Sci, Pondicherry, India
[2] Saradha Gangadaran Coll, Dept Comp Sci, Pondicherry, India
[3] KL Univ, Dept CSE, Guntur, Andhra Pradesh, India
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2018年 / 11卷 / 01期
关键词
DATA PLACEMENT; DISK CONSUMPTION; HADOOP; MAPREDUCE; STORAGE COST;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The MapReduce system has greater prevalence because of its advantages, for example, programming simplicity, fault tolerance and data distribution. The number of utilizations based on Hadoop is growing because of its robustness and features. Data locality is a critical issue in parallel data applications where the task processing is spending a various amount of time and resource at particular locations. Some methodologies have been proposed to enhance the data locality. In this paper, we identify the DP problem across nodes and improve the data locality. At first, MapReduce system divides the dataset into smaller subsets called data blocks. These data blocks are encoded with erasure coding to achieve the reliability. Then, the Flexible Data Placement (FDP) algorithm applies to the slave nodes (data nodes) which dynamically dispatches the data blocks based on their locality. It will reduce the collision of vulnerability, network traffic and increases the throughput of the Hadoop system. With the help of analytical model, execution time of every task is identified which detects the job with data locality problem. Then, the hash table is built for data blocks to the node. In data locality, a program is transferred to the node where the original data placed. Experiments are conducted on two real-world data sets with different data placement approaches, which show that the proposed methodology diminishes the execution time and upgrades the performance of 42.5%, which is the better performance than the existing methods.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
[21]   Performance Study on Indexing and Accessing of Small File in Hadoop Distributed File System [J].
Rodrigues, Anisha P. ;
Fernandes, Roshan ;
Vijaya, P. ;
Chander, Satish .
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2021, 20 (04)
[22]   A Review on Data locality in Hadoop MapReduce [J].
Sharma, Anil ;
Singh, Gurwinder .
2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, :723-728
[23]   Customized Web User Interface for Hadoop Distributed File System [J].
Krishna, T. Lakshmi Siva Rama ;
Ragunathan, T. ;
Battula, Sudheer Kumar .
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 2, 2016, 380 :567-576
[24]   Understanding the Performance of Erasure Codes in Hadoop Distributed File System [J].
Darrous, Jad ;
Ibrahim, Shadi .
PROCEEDINGS OF THE WORKSHOP ON CHALLENGES AND OPPORTUNITIES OF EFFICIENT AND PERFORMANT STORAGE SYSTEMS, CHEOPS 2022, 2022, :24-32
[25]   Research of Cloud Storage Based on Hadoop Distributed File System [J].
Han, Yongqi ;
Zhang, Yun ;
Yu, Shui .
APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 :2472-2475
[26]   Modeling and Simulation of Hadoop Distributed File System in a Cluster of Workstations [J].
Aguilera-Mendoza, Longendri ;
Llorente-Quesada, Monica T. .
MODEL AND DATA ENGINEERING, MEDI 2013, 2013, 8216 :1-12
[27]   Towards a Better Replica Management for Hadoop Distributed File System [J].
Ciritoglu, Hilmi Egemen ;
Saber, Takfarinas ;
Buda, Teodora Sandra ;
Murphy, John ;
Thorpe, Christina .
2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, :104-111
[28]   A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System [J].
Kim, Yonghwan ;
Araragi, Tadashi ;
Nakamura, Junya ;
Masuzawa, Toshimitsu .
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2014, :333-334
[29]   Forensic Investigation Using RAM Analysis on the Hadoop Distributed File System [J].
Laing, Stuart ;
Ludwiniak, Robert ;
El Boudani, Brahim ;
Chrysoulas, Christos ;
Ubakanma, George ;
Pitropakis, Nikolaos .
2023 19TH INTERNATIONAL CONFERENCE ON THE DESIGN OF RELIABLE COMMUNICATION NETWORKS, DRCN, 2023,
[30]   Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System [J].
Kim, Jongyeop ;
Kumar, Ashwin T. K. ;
George, K. M. ;
Park, Nohpill .
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, :62-68