BSP cost and scalability analysis for MapReduce operations

被引：9

作者：

Senger, Hermes ^{[1
]}

Gil-Costa, Veronica ^{[2
]}

Arantes, Luciana ^{[3
]}

Marcondes, Cesar A. C. ^{[1
]}

Marin, Mauricio ^{[4
]}

Sato, Liria M. ^{[5
]}

da Silva, Fabricio A. B. ^{[6
]}

机构：

[1] Fed Univ Sao Carlos UFSCar, Sao Carlos, SP, Brazil

[2] UNSL CONICET, San Luis, Argentina

[3] Univ Paris 06, CNRS, INRIA, LIP6,REGAL, Paris, France

[4] Univ Santiago, DIINF, CeBiB, Santiago, Chile

[5] Univ Sao Paulo, BR-05508 Sao Paulo, SP, Brazil

[6] Fiocruz MS, Oswaldo Cruz Fdn, BR-21045900 Rio De Janeiro, Brazil

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 08期

基金：

巴西圣保罗研究基金会;

关键词：

Mapreduce; Hadoop; scalability; BSP; PARALLEL; OPTIMIZATION; SCIENCE; SEARCH; HADOOP; MPI;

D O I：

10.1002/cpe.3628

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups. Copyright (c) 2015 John Wiley & Sons, Ltd.

引用

页码：2503 / 2527

页数：25

共 78 条

[1]

Afrati FN, 2012, ARXIV12041754

[2]

Akritidis L, 2012, P 13 INT C WEB INF S, P609

[3]

[Anonymous], 2010, SYNTHESIS LECT HUMAN, DOI DOI 10.2200/S00274ED1V01Y201006HLT007

[4]

[Anonymous], 2008, P ACM WORKSH PROGR R, DOI DOI 10.1145/1397718.1397732

[5]

[Anonymous], 2011, INT WORLD WIDE WEB C, DOI DOI 10.1145/1963405.1963491

[6]

[Anonymous], 1978, P 10 ANN ACM S THEOR, DOI [DOI 10.1145/800133.804339, 10.1145/800133.804339]

[7]

[Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593

[8]

[Anonymous], 2003, Introduction to Parallel Computing

[9]

[Anonymous], 2009, KDD09 15 ACM SIGKDD

[10]

[Anonymous], DIG UN 2020 BIG DAT

← 1 2 3 4 5 6 7 8 →