Data Analytics in the Cloud with Flexible MapReduce Workflows

被引：0

作者：

Goncalves, Carlos ^{[1
,2
]}

Assuncao, Luis ^{[1
,2
]}

Cunha, Jose C. ^{[2
]}

机构：

[1] Univ Nova Lisboa, Inst Super Engn Lisboa, P-1200 Lisbon, Portugal

[2] Univ Nova Lisboa, Fac Ciencias Tecnol, Dept Informat, CITI, P-1200 Lisbon, Portugal

来源：

2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM) | 2012年

关键词：

MapReduce; Workflow; Text Mining; Cloud; MAP-REDUCE;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud It would he desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition. of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the A WARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the A WARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.

引用

页数：8

共 50 条

[1] Flexible MapReduce Workflows for Cloud Data Analytics
Goncalves, Carlos
Assuncao, Luis
Cunha, Jose C.
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2013, 5 (04) : 48 - 64
[2] Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce
Clemente-Castello, Francisco J.
Nicolae, Bogdan
Katrinis, Kostas
Rafique, M. Mustafa
Mayo, Rafael
Carlos Fernandez, Juan
Loreti, Daniela
2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 290 - 299
[3] Cross-Cloud MapReduce for Big Data
Li, Peng
Guo, Song
Yu, Shui
Zhuang, Weihua
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (02) : 375 - 386
[4] A Scheduling Algorithm for Hadoop MapReduce Workflows with Budget Constraints in the Heterogeneous Cloud
Wylie, Andrew
Shi, Wei
Corriveau, Jean-Pierre
Wang, Yang
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1433 - 1442
[5] Optimal construction of virtual networks for Cloud-based MapReduce workflows
Xu, Cong
Yang, Jiahai
Yin, Kevin
Yu, Hui
COMPUTER NETWORKS, 2017, 112 : 194 - 207
[6] AMPO: Algorithm for MapReduce Performance Optimization for Enhancing Big Data Analytics
Yambem, Nandita
Nandakumar, A. N.
2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 717 - 723
[7] Parallelizing XML data-streaming workflows via MapReduce
Zinn, Daniel
Bowers, Shawn
Koehler, Sven
Ludaescher, Bertram
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2010, 76 (06) : 447 - 463
[8] MapReduce in the Cloud: Data-Location-Aware VM Scheduling
Tung Nguyen
Weisong Shi
ZTECommunications, 2013, 11 (04) : 18 - 26
[9] A Customizable MapReduce Framework for Complex Data-Intensive Workflows on GPUs
Qiao, Zhi
Liang, Shuwen
Jiang, Hai
Fu, Song
2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
[10] An Approach in Big Data Analytics to Improve the Velocity of Unstructured Data Using MapReduce
Sundarakumar, M. R.
Mahadevan, G.
Somula, Ramasubbareddy
Sennan, Sankar
Rawal, Bharat S.
INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2021, 10 (04)

← 1 2 3 4 5 →