G-Hadoop: MapReduce across distributed data centers for data-intensive computing

被引:232
|
作者
Wang, Lizhe [1 ,2 ]
Tao, Jie [3 ]
Ranjan, Rajiv [4 ]
Marten, Holger [3 ]
Streit, Achim [3 ,6 ]
Chen, Jingying [5 ]
Chen, Dan [1 ]
机构
[1] China Univ Geosci, Sch Comp, Wuhan 430074, Peoples R China
[2] Chinese Acad Sci, Ctr Earth Observat & Digital Earth, Beijing 100864, Peoples R China
[3] Karlsruhe Inst Technol, Steinbuch Ctr Comp, D-76021 Karlsruhe, Germany
[4] CSIRO, ICT Ctr, Informat Engn Lab, Canberra, ACT, Australia
[5] Cent China Normal Univ, Natl Engn Ctr E Learning, Beijing, Peoples R China
[6] Karlsruhe Inst Technol, Inst Telemat, Dept Informat, D-76021 Karlsruhe, Germany
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2013年 / 29卷 / 03期
基金
中国国家自然科学基金;
关键词
Cloud computing; Massive data processing; Data-intensive computing; Hadoop; MapReduce; CLOUD;
D O I
10.1016/j.future.2012.09.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:739 / 750
页数:12
相关论文
共 50 条
  • [1] A security framework in G-Hadoop for big data computing across distributed Cloud data centres
    Zhao, Jiaqi
    Wang, Lizhe
    Tao, Jie
    Chen, Jinjun
    Sun, Weiye
    Ranjan, Rajiv
    Kolodziej, Joanna
    Streit, Achim
    Georgakopoulos, Dimitrios
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2014, 80 (05) : 994 - 1007
  • [2] MapReduce Across Distributed Clusters for Data-intensive Applications
    Wang, Lizhe
    Tao, Jie
    Marten, Holger
    Streit, Achim
    Khan, Samee U.
    Kolodziej, Joanna
    Chen, Dan
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2004 - 2011
  • [3] Data-Intensive Workload Consolidation for the Hadoop Distributed File System
    Moraveji, Reza
    Taheri, Javid
    Reza, Mohammad
    Rizvandi, Nikzad Babaii
    Zomaya, Albert Y.
    2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 95 - 103
  • [4] Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing
    Gowanlock, Michael
    Gallet, Benoit
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 350 - 357
  • [5] Data-Intensive Text Processing with MapReduce
    Xu, Peng
    COMPUTATIONAL LINGUISTICS, 2011, 37 (03) : 635 - 637
  • [6] Distributed Data Access/Find System with Metadata for Data-Intensive Computing
    Ikebe, Minoru
    Inomata, Atsuo
    Fujikawa, Kazutoshi
    Sunahara, Hideki
    2008 9TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING, 2008, : 361 - 366
  • [7] Nebula: Distributed Edge Cloud for Data-Intensive Computing
    Ryden, Mathew
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2014, : 491 - 492
  • [8] Distributed Data Provenance for Large-Scale Data-Intensive Computing
    Zhao, Dongfang
    Shou, Chen
    Malik, Tanu
    Raicu, Ioan
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [9] Software Design and Implementation for MapReduce across Distributed Data Centers
    Wang, Lizhe
    Tao, Jie
    Ma, Yan
    Khan, Samee U.
    Kolodziej, Joanna
    Chen, Dan
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 : 85 - 90
  • [10] Applications in Data-Intensive Computing
    Shah, Anuj R.
    Adkins, Joshua N.
    Baxter, Douglas J.
    Cannon, William R.
    Chavarria-Miranda, Daniel G.
    Choudhury, Sutanay
    Gorton, Ian
    Gracio, Deborah K.
    Halter, Todd D.
    Jaitly, Navdeep D.
    Johnson, John R.
    Kouzes, Richard T.
    Macduff, Matthew C.
    Marquez, Andres
    Monroe, Matthew E.
    Oehmen, Christopher S.
    Pike, William A.
    Scherrer, Chad
    Villa, Oreste
    Webb-Robertson, Bobbie-Jo
    Whitney, Paul D.
    Zuljevic, Nino
    ADVANCES IN COMPUTERS, VOL 79, 2010, 79 : 1 - 70