Locality-Aware Workflow Orchestration for Big Data

被引:3
作者
Corodescu, Andrei-Alin [1 ]
Nikolov, Nikolay [2 ]
Khan, Akif Quddus [3 ]
Soylu, Ahmet [4 ]
Matskin, Mihhail [5 ]
Payberah, Amir H. [5 ]
Roman, Dumitru [2 ]
机构
[1] Univ Oslo, Oslo, Norway
[2] SINTEF AS, Oslo, Norway
[3] Norwegian Univ Sci & Technol, Gjovik, Norway
[4] Oslo Metropolitan Univ, Oslo, Norway
[5] KTH Royal Inst Technol, Stockholm, Sweden
来源
13TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2021 | 2020年
基金
欧盟地平线“2020”;
关键词
Big Data workflows; data locality; software containers; EDGE;
D O I
10.1145/3444757.3485106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The development of the Edge computing paradigm shifts data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructure. Such a paradigm requires data processing solutions that consider data locality in order to reduce the performance penalties from data transfers between remote (in network terms) data centres. However, existing Big Data processing solutions have limited support for handling data locality and are inefficient in processing small and frequent events specific to Edge environments. This paper proposes a novel architecture and a proof-of-concept implementation for software container-centric Big Data workflow orchestration that puts data locality at the forefront. Our solution considers any available data locality information by default, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare our system with Argo workflow and show significant performance improvements in terms of speed of execution for processing units of data using our data locality aware Big Data workflow approach.
引用
收藏
页码:62 / 70
页数:9
相关论文
共 21 条
[1]  
Abranches M, 2019, USENIX WORKSH HOT TO
[2]  
Albrecht M., 2012, P 1 ACM SIGMOD WORKS, P1, DOI [DOI 10.1145/2443416.2443417, 10.1145/2443416.2443417]
[3]  
Ashabi A, 2020, IEEE 10TH SYMPOSIUM ON COMPUTER APPLICATIONS AND INDUSTRIAL ELECTRONICS (ISCAIE 2020), P131, DOI [10.1109/iscaie47305.2020.9108826, 10.1109/ISCAIE47305.2020.9108826]
[4]   Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions [J].
Barika, Mutaz ;
Garg, Saurabh ;
Zomaya, Albert Y. ;
Wang, Lizhe ;
Van Moorsel, Aad ;
Ranjan, Rajiv .
ACM COMPUTING SURVEYS, 2019, 52 (05)
[5]  
Barika Mutaz, 2019, Future Directions, V52, p95:1, DOI 10.11453332301
[6]  
Bourhim EH., 2019, INT CONF NETW SER, P1, DOI DOI 10.23919/cnsm46954.2019.9012671
[7]  
[陈游旻 Chen Youmin], 2019, [计算机研究与发展, Journal of Computer Research and Development], V56, P227
[8]   Scalable Execution of Big Data Workflows using Software Containers [J].
Dessalk, Yared Dejene ;
Nikolov, Nikolay ;
Matskin, Mihhail ;
Soylu, Ahmet ;
Roman, Dumitru .
12TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2020, 2020, :76-83
[9]   A Study of Data Locality in YARN [J].
Elshater, Yehia ;
Martin, Patrick ;
Rope, Dan ;
McRoberts, Mike ;
Statchuk, Craig .
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, :174-181
[10]   Edge computing: A survey [J].
Khan, Wazir Zada ;
Ahmed, Ejaz ;
Hakak, Saqib ;
Yaqoob, Ibrar ;
Ahmed, Arif .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 97 :219-235