G-Hadoop: MapReduce across distributed data centers for data-intensive computing

被引:242
作者
Wang, Lizhe [1 ,2 ]
Tao, Jie [3 ]
Ranjan, Rajiv [4 ]
Marten, Holger [3 ]
Streit, Achim [3 ,6 ]
Chen, Jingying [5 ]
Chen, Dan [1 ]
机构
[1] China Univ Geosci, Sch Comp, Wuhan 430074, Peoples R China
[2] Chinese Acad Sci, Ctr Earth Observat & Digital Earth, Beijing 100864, Peoples R China
[3] Karlsruhe Inst Technol, Steinbuch Ctr Comp, D-76021 Karlsruhe, Germany
[4] CSIRO, ICT Ctr, Informat Engn Lab, Canberra, ACT, Australia
[5] Cent China Normal Univ, Natl Engn Ctr E Learning, Beijing, Peoples R China
[6] Karlsruhe Inst Technol, Inst Telemat, Dept Informat, D-76021 Karlsruhe, Germany
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2013年 / 29卷 / 03期
基金
中国国家自然科学基金;
关键词
Cloud computing; Massive data processing; Data-intensive computing; Hadoop; MapReduce; CLOUD;
D O I
10.1016/j.future.2012.09.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:739 / 750
页数:12
相关论文
共 50 条
[21]   Bucket MapReduce: Relieving the Disk I/O Intensity of Data-Intensive Applications in MapReduce Frameworks [J].
Chen, Kai-Hsun ;
Chen, Hsin-Yuan ;
Wang, Chien-Min .
2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, :18-25
[22]   DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems [J].
Wang, Tao ;
Yao, Shihong ;
Xu, Zhengquan ;
Jia, Shan .
JOURNAL OF SUPERCOMPUTING, 2016, 72 (07) :2537-2564
[23]   Genetic Based Data Placement for Geo-Distributed Data-Intensive Applications in Cloud Computing [J].
Fan, Weifeng ;
Peng, Jun ;
Zhang, Xiaoyong ;
Huang, Zhiwu .
ADVANCES IN SERVICES COMPUTING, 2016, 10065 :253-265
[24]   DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems [J].
Tao Wang ;
Shihong Yao ;
Zhengquan Xu ;
Shan Jia .
The Journal of Supercomputing, 2016, 72 :2537-2564
[25]   Improvement of job completion time in data-intensive cloud computing applications [J].
Ibrahim Adel Ibrahim ;
Mostafa Bassiouni .
Journal of Cloud Computing, 9
[26]   A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing [J].
Rao, Bingbing ;
Wang, Liqang .
2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, :81-88
[27]   Performance Evaluation of Data-Intensive Computing Applications on a Public IaaS Cloud [J].
Exposito, Roberto R. ;
Taboada, Guillermo L. ;
Ramos, Sabela ;
Tourino, Juan ;
Doallo, Ramon .
COMPUTER JOURNAL, 2016, 59 (03) :287-307
[28]   Improvement of job completion time in data-intensive cloud computing applications [J].
Ibrahim, Ibrahim Adel ;
Bassiouni, Mostafa .
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2020, 9 (01)
[29]   Computation Model of Data Intensive Computing with MapReduce [J].
Adamov, Abzetdin Z. .
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
[30]   A Review on Data locality in Hadoop MapReduce [J].
Sharma, Anil ;
Singh, Gurwinder .
2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, :723-728