An enhancement of data locality in Hadoop distributed file system

被引:0
作者
Reddy, A. Siva Krishna [1 ]
Sujatha, Pothula [1 ]
Koti, Prasad [2 ]
Dhavachelvan, P. [1 ]
Amudhavel, J. [3 ]
机构
[1] Pondicherry Univ, Dept Comp Sci, Pondicherry, India
[2] Saradha Gangadaran Coll, Dept Comp Sci, Pondicherry, India
[3] KL Univ, Dept CSE, Guntur, Andhra Pradesh, India
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2018年 / 11卷 / 01期
关键词
DATA PLACEMENT; DISK CONSUMPTION; HADOOP; MAPREDUCE; STORAGE COST;
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The MapReduce system has greater prevalence because of its advantages, for example, programming simplicity, fault tolerance and data distribution. The number of utilizations based on Hadoop is growing because of its robustness and features. Data locality is a critical issue in parallel data applications where the task processing is spending a various amount of time and resource at particular locations. Some methodologies have been proposed to enhance the data locality. In this paper, we identify the DP problem across nodes and improve the data locality. At first, MapReduce system divides the dataset into smaller subsets called data blocks. These data blocks are encoded with erasure coding to achieve the reliability. Then, the Flexible Data Placement (FDP) algorithm applies to the slave nodes (data nodes) which dynamically dispatches the data blocks based on their locality. It will reduce the collision of vulnerability, network traffic and increases the throughput of the Hadoop system. With the help of analytical model, execution time of every task is identified which detects the job with data locality problem. Then, the hash table is built for data blocks to the node. In data locality, a program is transferred to the node where the original data placed. Experiments are conducted on two real-world data sets with different data placement approaches, which show that the proposed methodology diminishes the execution time and upgrades the performance of 42.5%, which is the better performance than the existing methods.
引用
收藏
页码:123 / 133
页数:11
相关论文
共 50 条
[41]   Data Locality in Hadoop Cluster Systems [J].
Khan, Mukhtaj ;
Liu, Yang ;
Li, Maozhen .
2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, :720-724
[42]   Big Data: Mining of Log File through Hadoop [J].
Kotiyal, Bina ;
Kumar, Ankit ;
Pant, Bhaskar ;
Goudar, R. H. .
2013 INTERNATIONAL CONFERENCE ON HUMAN COMPUTER INTERACTIONS (ICHCI), 2013,
[43]   Enabling Prioritized Cloud I/O Service in Hadoop Distributed File System [J].
Yeh, Tsozen ;
Sun, Yifeng .
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, :256-259
[44]   Distributed Data Platform System Based on Hadoop Platform [J].
Guo, Jianwei ;
Du, Liping ;
Li, Ying ;
Zhao, Guifen ;
Jiya, Jiang .
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013), 2014, 255 :533-539
[45]   BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability [J].
Mothukuri, Viraaji ;
Cheerla, Sai S. ;
Parizi, Reza M. ;
Zhang, Qi ;
Choo, Kim-Kwang Raymond .
BLOCKCHAIN-RESEARCH AND APPLICATIONS, 2021, 2 (04)
[46]   A DYNAMIC REPLICA STRATEGY BASED ON MARKOV MODEL FOR HADOOP DISTRIBUTED FILE SYSTEM (HDFS) [J].
Qu, Kaiyang ;
Meng, Luoming ;
Yang, Yang .
PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, :337-342
[47]   Hybrid-Key Stream Cipher Mechanism for Hadoop Distributed File System Security [J].
Khafagy, Omar Helmy ;
Ibrahim, Mohamed Hasan ;
Omara, Fatma A. .
PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, :39-43
[48]   Addressing NameNode Scalability Issue in Hadoop Distributed File System using Cache Approach [J].
Mukhopadhyay, Debajyoti ;
Agrawal, Chetan ;
Maru, Devesh ;
Yedale, Pooja ;
Gadekar, Pranav .
2014 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT), 2014, :321-326
[49]   A Data Locality Optimization Algorithm for Large-scale Data Processing in Hadoop [J].
Zhao, Yanrong ;
Wang, Weiping ;
Meng, Dan ;
Yang, Xiufeng ;
Zhang, Shubin ;
Li, Jun ;
Guan, Gang .
2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, :655-661
[50]   Resource Scheduling and Data Locality for Virtualized Hadoop on IaaS Cloud Platform [J].
Tao, Dan ;
Wang, Bingxu ;
Lin, Zhaowen ;
Wu, Tin-Yu .
BIG DATA COMPUTING AND COMMUNICATIONS, (BIGCOM 2016), 2016, 9784 :332-341