Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions

被引:40
作者
Barika, Mutaz [1 ]
Garg, Saurabh [1 ]
Zomaya, Albert Y. [2 ]
Wang, Lizhe [3 ]
Van Moorsel, Aad [4 ]
Ranjan, Rajiv [3 ,4 ]
机构
[1] Univ Tasmania, Coll Sci & Engn, Sch Technol Environm & Design TED, Discipline ICT, Hobart, Tas 7001, Australia
[2] Univ Sydney, Fac Engn, Sch Comp Sci, J12 Comp Sci Bldg, Sydney, NSW, Australia
[3] China Univ Geosci, Sch Comp Sci, 388 Lumo Rd, Wuhan, Hubei, Peoples R China
[4] Newcastle Univ, Sch Comp, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
基金
英国自然环境研究理事会;
关键词
Big data; cloud computing; workflow orchestration; research taxonomy; approaches; techniques; DATA-INTENSIVE APPLICATIONS; SCIENTIFIC WORKFLOWS; DATA-MANAGEMENT; DATA ANALYTICS; TAXONOMY; SYSTEMS; OPTIMIZATION; SELECTION;
D O I
10.1145/3332301
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Interest in processing big data has increased rapidly to gain insights that can transform businesses, government policies, and research outcomes. This has led to advancement in communication, programming, and processing technologies, including cloud computing services and technologies such as Hadoop, Spark, and Storm. This trend also affects the needs of analytical applications, which are no longer monolithic but composed of several individual analytical steps running in the form of a workflow. These big data workflows are vastly different in nature from traditional workflows. Researchers are currently facing the challenge of how to orchestrate and manage the execution of such workflows. In this article, we discuss in detail orchestration requirements of these workflows as well as the challenges in achieving these requirements. We also survey current trends and research that supports orchestration of big data workflows and identify open research challenges to guide future developments in this area.
引用
收藏
页数:41
相关论文
共 148 条
[1]  
Adamu, 2016, TECHNICAL REPORT
[2]   Optimization of data-intensive workflows in stream-based data processing models [J].
Ahmad, Saima Gulzar ;
Liew, Chee Sun ;
Rafique, M. Mustafa ;
Munir, Ehsan Ullah .
JOURNAL OF SUPERCOMPUTING, 2017, 73 (09) :3901-3923
[3]   Data-Intensive Workflow Optimization based on Application Task Graph Partitioning in Heterogeneous Computing Systems [J].
Ahmad, Saima Gulzar ;
Liew, Chee Sun ;
Rafique, M. Mustafa ;
Munir, Ehsan Ullah ;
Khan, Samee U. .
2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, :129-136
[4]  
Alrokayan M, 2014, 2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), P49
[5]  
Amstutz, 2016, COMMON WORKFLOW LANG
[6]   Creating a semantically-enhanced cloud services environment through ontology evolution [J].
Angel Rodriguez-Garcia, Miguel ;
Valencia-Garcia, Rafael ;
Garcia-Sanchez, Francisco ;
Javier Samper-Zapater, J. .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 32 :295-306
[7]  
[Anonymous], 2013, P SCI INF C SAI IEEE
[8]  
[Anonymous], 2013, ABS13037195 CORR
[9]  
[Anonymous], P 1 ACM SIGMOD WORKS
[10]  
[Anonymous], 2014, P 1 INT WORKSH PRIV