Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks

被引:21
作者
Hua, Xiayu [1 ]
Wu, Hao [1 ]
Li, Zheng [1 ]
Ren, Shangping [1 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
HDFS; Interaction intensive task; Cache; Hierarchical structure; PSO; Storage allocation algorithm;
D O I
10.1016/j.jpdc.2014.03.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applications that have large I/O data sets. However, the performance of HDFS decreases dramatically when handling the operations of interaction-intensive files, i.e., files that have relatively small size but are frequently accessed. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increases 300% on average with only a negligible performance decrease for large data set tasks. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:2770 / 2779
页数:10
相关论文
共 36 条
  • [31] Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective
    Kapil, Gayatri
    Agrawal, Alka
    Attaallah, Abdulaziz
    Algarni, Abdullah
    Kumar, Rajeev
    Khan, Raees Ahmad
    PEERJ COMPUTER SCIENCE, 2020, PeerJ Inc. (2020) : 1 - 31
  • [32] Big Data Performance Analysis on a Hadoop Distributed File System Based on Modified Partitional Clustering Algorithm
    Marichamy, V. Santhana
    Natarajan, V
    SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 461 - 468
  • [33] Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform
    Alange, Neeta
    Mathur, Anjali
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (07): : 359 - 364
  • [34] Small files access efficiency in hadoop distributed file system a case study performed on British library text files
    Alange, Neeta
    Sagar, P. Vidya
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (06): : 3381 - 3388
  • [35] Semantic web service-based messaging framework for prediction of fitness data using Hadoop distributed file system
    Sethurannan, R.
    Sasiprabha, T.
    AUTOMATIKA, 2019, 60 (03) : 349 - 359
  • [36] Small files access efficiency in hadoop distributed file system a case study performed on British library text files
    Neeta Alange
    P. Vidya Sagar
    Cluster Computing, 2023, 26 : 3381 - 3388