G-Hadoop: MapReduce across distributed data centers for data-intensive computing

被引：239

作者：

Wang, Lizhe ^{[1
,2
]}

Tao, Jie ^{[3
]}

Ranjan, Rajiv ^{[4
]}

Marten, Holger ^{[3
]}

Streit, Achim ^{[3
,6
]}

Chen, Jingying ^{[5
]}

Chen, Dan ^{[1
]}

机构：

[1] China Univ Geosci, Sch Comp, Wuhan 430074, Peoples R China

[2] Chinese Acad Sci, Ctr Earth Observat & Digital Earth, Beijing 100864, Peoples R China

[3] Karlsruhe Inst Technol, Steinbuch Ctr Comp, D-76021 Karlsruhe, Germany

[4] CSIRO, ICT Ctr, Informat Engn Lab, Canberra, ACT, Australia

[5] Cent China Normal Univ, Natl Engn Ctr E Learning, Beijing, Peoples R China

[6] Karlsruhe Inst Technol, Inst Telemat, Dept Informat, D-76021 Karlsruhe, Germany

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2013年 / 29卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Cloud computing; Massive data processing; Data-intensive computing; Hadoop; MapReduce; CLOUD;

D O I：

10.1016/j.future.2012.09.001

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：739 / 750

页数：12

共 50 条

[31] Improvement Of Data Throughput In Data-Intensive Cloud Computing Applications
Ibrahim, Ibrahim Adel
Bassiouni, Mostafa
[J]. 2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2019), 2019, : 49 - 54
[32] A Customizable MapReduce Framework for Complex Data-Intensive Workflows on GPUs
Qiao, Zhi
Liang, Shuwen
Jiang, Hai
Fu, Song
[J]. 2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
[33] Research on the architecture of data-intensive computing platform
Hou, Ke
Zhang, Jing
Fang, Xing
[J]. Journal of Software Engineering, 2015, 9 (03): : 686 - 701
[34] In-Memory Data Rearrangement for Irregular, Data-Intensive Computing
Lloyd, Scott
Gokhale, Maya
[J]. COMPUTER, 2015, 48 (08) : 18 - 25
[35] Nebula: Distributed Edge Cloud for Data Intensive Computing
Jonathan, Albert
Ryden, Mathew
Oh, Kwangsung
Chandra, Abhishek
Weissman, Jon
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3229 - 3242
[36] A capabilities-aware framework for using computational accelerators in data-intensive computing
Rafique, M. Mustafa
Butt, Ali R.
Nikolopoulos, Dimitrios S.
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (02) : 185 - 197
[37] Design of Self-Adjusting algorithm for data-intensive MapReduce Applications
Nagiwale, Amin Nazir
Umale, Manish R.
Sinha, Aditya Kumar
[J]. 2015 INTERNATIONAL CONFERENCE ON ENERGY SYSTEMS AND APPLICATIONS, 2015, : 506 - 510
[38] An Improved Bayesian Inference Method for Data-Intensive Computing
Ma, Feng
Liu, Weiyi
[J]. COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2012, 316 : 134 - 144
[39] Distributed data structure templates for data-intensive remote sensing applications
Ma, Yan
Wang, Lizhe
Liu, Dingsheng
Yuan, Tao
Liu, Peng
Zhang, Wanfeng
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (12) : 1784 - 1797
[40] Data Analysis using Hadoop MapReduce Environment
Merla, PrathyushaRani
Liang, Yiheng
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4783 - 4785

← 1 2 3 4 5 →