G-Hadoop: MapReduce across distributed data centers for data-intensive computing

被引:232
作者
Wang, Lizhe [1 ,2 ]
Tao, Jie [3 ]
Ranjan, Rajiv [4 ]
Marten, Holger [3 ]
Streit, Achim [3 ,6 ]
Chen, Jingying [5 ]
Chen, Dan [1 ]
机构
[1] China Univ Geosci, Sch Comp, Wuhan 430074, Peoples R China
[2] Chinese Acad Sci, Ctr Earth Observat & Digital Earth, Beijing 100864, Peoples R China
[3] Karlsruhe Inst Technol, Steinbuch Ctr Comp, D-76021 Karlsruhe, Germany
[4] CSIRO, ICT Ctr, Informat Engn Lab, Canberra, ACT, Australia
[5] Cent China Normal Univ, Natl Engn Ctr E Learning, Beijing, Peoples R China
[6] Karlsruhe Inst Technol, Inst Telemat, Dept Informat, D-76021 Karlsruhe, Germany
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2013年 / 29卷 / 03期
基金
中国国家自然科学基金;
关键词
Cloud computing; Massive data processing; Data-intensive computing; Hadoop; MapReduce; CLOUD;
D O I
10.1016/j.future.2012.09.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:739 / 750
页数:12
相关论文
共 50 条
  • [21] Improvement of job completion time in data-intensive cloud computing applications
    Ibrahim Adel Ibrahim
    Mostafa Bassiouni
    Journal of Cloud Computing, 9
  • [22] A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing
    Rao, Bingbing
    Wang, Liqang
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 81 - 88
  • [23] Improvement of job completion time in data-intensive cloud computing applications
    Ibrahim, Ibrahim Adel
    Bassiouni, Mostafa
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2020, 9 (01):
  • [24] Genetic Based Data Placement for Geo-Distributed Data-Intensive Applications in Cloud Computing
    Fan, Weifeng
    Peng, Jun
    Zhang, Xiaoyong
    Huang, Zhiwu
    ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 253 - 265
  • [25] Performance Evaluation of Data-Intensive Computing Applications on a Public IaaS Cloud
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Ramos, Sabela
    Tourino, Juan
    Doallo, Ramon
    COMPUTER JOURNAL, 2016, 59 (03) : 287 - 307
  • [26] DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems
    Tao Wang
    Shihong Yao
    Zhengquan Xu
    Shan Jia
    The Journal of Supercomputing, 2016, 72 : 2537 - 2564
  • [27] DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems
    Wang, Tao
    Yao, Shihong
    Xu, Zhengquan
    Jia, Shan
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (07) : 2537 - 2564
  • [28] A Review on Data locality in Hadoop MapReduce
    Sharma, Anil
    Singh, Gurwinder
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 723 - 728
  • [29] A New Data Classification Algorithm for Data-Intensive Computing Environments
    Deng, Qizhi
    Zhang, Longbo
    Qian, Xin
    Chen, Yali
    Wang, Fengying
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 1351 - 1354
  • [30] Computation Model of Data Intensive Computing with MapReduce
    Adamov, Abzetdin Z.
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,