DALiuGE: A graph execution framework for harnessing the astronomical data deluge

被引:28
作者
Wu, C. [1 ]
Tobar, R. [1 ]
Vinsen, K. [1 ]
Wicenec, A. [1 ]
Pallot, D. [1 ]
Lao, B. [2 ]
Wang, R. [3 ]
An, T. [2 ]
Boulton, M. [1 ]
Cooper, I. [1 ]
Dodson, R. [1 ]
Dolensky, M. [1 ]
Mei, Y. [4 ,5 ]
Wang, F. [4 ,5 ]
机构
[1] Univ Western Australia, ICRAR, M468,35 Stirling Highway, Perth, WA 6009, Australia
[2] Shanghai Astron Observ, Shanghai, Peoples R China
[3] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
[4] Kunming Univ Sci & Technol, Kunming, Yunnan, Peoples R China
[5] Chinese Acad Sci, Yunnan Observ, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Dataflow; Graph execution engine; Data driven; Square kilometre array; Many-task computing;
D O I
10.1016/j.ascom.2017.03.007
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
The Data Activated Liu(1) Graph Engine - DALiuGE(2) - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both datasets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry datasets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 68 条
[1]  
Aarts E., 2014, Search methodologies, P265, DOI [DOI 10.1007/978-1-4614-6940-7_10, 10.1007/978-1-4614-6940-7_10]
[2]  
[Anonymous], 2015, 1st URSI Atlantic Radio Science Conference (URSI AT-RASC), DOI DOI 10.1109/URSI-AT-RASC.2015.7303195
[3]  
[Anonymous], 2014, BUILDING DATA PIPELI
[4]  
[Anonymous], 2012, P NSDI 12
[5]  
[Anonymous], 2012, ASSIGNMENT PROBLEMS
[6]  
[Anonymous], 2015, Proc. Sci, DOI DOI 10.22323/1.215.0174
[7]  
[Anonymous], 2016, OPENCLUSTER PYTHON D
[8]  
[Anonymous], 2016, EAS US INT CROSS LAN
[9]  
[Anonymous], 2016, SQUARE KILOMETRE ARR
[10]  
[Anonymous], 2012, Hadoop: The definitive guide