Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks

被引:21
|
作者
Hua, Xiayu [1 ]
Wu, Hao [1 ]
Li, Zheng [1 ]
Ren, Shangping [1 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
HDFS; Interaction intensive task; Cache; Hierarchical structure; PSO; Storage allocation algorithm;
D O I
10.1016/j.jpdc.2014.03.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applications that have large I/O data sets. However, the performance of HDFS decreases dramatically when handling the operations of interaction-intensive files, i.e., files that have relatively small size but are frequently accessed. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increases 300% on average with only a negligible performance decrease for large data set tasks. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:2770 / 2779
页数:10
相关论文
共 36 条
  • [1] Enhancing Throughput of Hadoop Distributed File System for Interaction-Intensive Tasks
    Hua, Xiayu
    Wu, Hao
    Ren, Shangping
    2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 508 - 511
  • [2] The Hadoop Distributed File System
    Shvachko, Konstantin
    Kuang, Hairong
    Radia, Sanjay
    Chansler, Robert
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [3] Data Security in Hadoop Distributed File System
    Shetty, Madhvaraj M.
    Manjaiah, D. H.
    IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [4] High Performance Hadoop Distributed File System
    Elkawkagy, Mohamed
    Elbeh, Heba
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2020, 8 (03) : 119 - 123
  • [5] Research on reliability of hadoop distributed file system
    Hu, Daming
    Chen, Deyun
    Lou, Shuhui
    Pei, Shujun
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 315 - 326
  • [6] Analytical Review on Hadoop Distributed File System
    Dwivedi, Kalpana
    Dubey, Sanjay Kumar
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 174 - 181
  • [7] High Performance Hadoop Distributed File System
    Mohamed Elkawkagy
    Heba Elbeh
    International Journal of Networked and Distributed Computing, 2020, 8 : 119 - 123
  • [8] Performance Analysis of Hadoop Distributed File System Writing File Process
    Xie, Yunyue
    Farhan, Abobaker Mohammed Qasem
    Zhou, Meihua
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS), 2018, : 116 - 120
  • [9] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [10] Data Adaptively Storing Approach for Hadoop Distributed File System
    Fu, Yingxun
    Wen, Shilin
    Ma, Li
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA), 2017, : 20 - 24