An enhancement of data locality in Hadoop distributed file system

被引:0
作者
Reddy, A. Siva Krishna [1 ]
Sujatha, Pothula [1 ]
Koti, Prasad [2 ]
Dhavachelvan, P. [1 ]
Amudhavel, J. [3 ]
机构
[1] Pondicherry Univ, Dept Comp Sci, Pondicherry, India
[2] Saradha Gangadaran Coll, Dept Comp Sci, Pondicherry, India
[3] KL Univ, Dept CSE, Guntur, Andhra Pradesh, India
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2018年 / 11卷 / 01期
关键词
DATA PLACEMENT; DISK CONSUMPTION; HADOOP; MAPREDUCE; STORAGE COST;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The MapReduce system has greater prevalence because of its advantages, for example, programming simplicity, fault tolerance and data distribution. The number of utilizations based on Hadoop is growing because of its robustness and features. Data locality is a critical issue in parallel data applications where the task processing is spending a various amount of time and resource at particular locations. Some methodologies have been proposed to enhance the data locality. In this paper, we identify the DP problem across nodes and improve the data locality. At first, MapReduce system divides the dataset into smaller subsets called data blocks. These data blocks are encoded with erasure coding to achieve the reliability. Then, the Flexible Data Placement (FDP) algorithm applies to the slave nodes (data nodes) which dynamically dispatches the data blocks based on their locality. It will reduce the collision of vulnerability, network traffic and increases the throughput of the Hadoop system. With the help of analytical model, execution time of every task is identified which detects the job with data locality problem. Then, the hash table is built for data blocks to the node. In data locality, a program is transferred to the node where the original data placed. Experiments are conducted on two real-world data sets with different data placement approaches, which show that the proposed methodology diminishes the execution time and upgrades the performance of 42.5%, which is the better performance than the existing methods.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
[1]   Hadoop Distributed File System for Big data analysis [J].
Almansouri, Hatim Talal ;
Masmoudi, Youssef .
PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, :257-261
[2]   The Hadoop Distributed File System [J].
Shvachko, Konstantin ;
Kuang, Hairong ;
Radia, Sanjay ;
Chansler, Robert .
2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
[3]   Data Security in Hadoop Distributed File System [J].
Shetty, Madhvaraj M. ;
Manjaiah, D. H. .
IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
[4]   Investigation of Replication Factor for Performance Enhancement in the Hadoop Distributed File System [J].
Ciritoglu, Hilmi Egemen ;
de Almeida, Leandro Batista ;
de Almeida, Eduardo Cunha ;
Buda, Teodora Sandra ;
Murphy, John ;
Thorpe, Christina .
COMPANION OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, :135-140
[5]   A CKAN Plugin for Data Harvesting to the Hadoop Distributed File System [J].
Scholz, Robert ;
Tcholtchev, Nikolay ;
Laemmel, Philipp ;
Schieferdecker, Ina .
CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, :19-28
[6]   An Efficient Data Duplication System based on Hadoop Distributed File System [J].
Veeraiah, D. ;
Rao, J. Nageswara .
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, :197-200
[7]   Complete Data Deletion Based on Hadoop Distributed File System [J].
Wang, Fulin ;
Wu, Shunxiang ;
Cai, Jianhuai ;
Zhao, Longze ;
Liao, Zhendong ;
Ming, Daodong .
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
[8]   Research on Distributed File System with Hadoop [J].
Xu, JunWu ;
Liang, JunLing .
NETWORK COMPUTING AND INFORMATION SECURITY, 2012, 345 :148-+
[9]   Data-Intensive Workload Consolidation for the Hadoop Distributed File System [J].
Moraveji, Reza ;
Taheri, Javid ;
Reza, Mohammad ;
Rizvandi, Nikzad Babaii ;
Zomaya, Albert Y. .
2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, :95-103
[10]   An approach for Big Data Security based on Hadoop Distributed File system [J].
Mahmoud, Hadeer ;
Hegazy, Abdelfatah ;
Khafagy, Mohamed H. .
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMPUTER ENGINEERING (ITCE' 2018), 2018, :109-114