Towards autonomic data management for staging-based coupled scientific workflows

被引:8
作者
Jin T. [1 ]
Zhang F. [1 ]
Sun Q. [1 ]
Romanus M. [1 ]
Bui H. [1 ]
Parashar M. [1 ]
机构
[1] Rutgers Discovery Informatics Institute, Piscataway, 08854, NJ
基金
美国国家科学基金会;
关键词
Autonomic computing; Data management; Data staging; HPC workflow; In-situ;
D O I
10.1016/j.jpdc.2020.07.002
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Emerging scientific workflows running at extreme scale are composed of multiple applications that interact and exchange data at runtime. While staging-based approaches, e.g. in-situ/in-transit processing, are promising, dynamic behaviors (e.g. data volumes and distributions) in coupled applications and varying resource constraints at runtime make the efficient use of these techniques challenging. Addressing these challenges requires fundamental changes in the way that workflows are executed at runtime. Specifically, it is required to monitor the operating environment and running applications, and then adapt and tune the application behaviors and resource allocations at runtime while meeting the data management requirements and constraints. In this paper, we propose a policy-based autonomic data management (ADM) approach that can adaptively respond at runtime to dynamic data management requirements. We first formulate the schematic abstraction of this ADM approach including its conceptual model and system elements. Then, we explore the realization of ADM runtime and demonstrate how to achieve adaptations in a cross-layer manner with pre-defined autonomic policies. We also prototype our ADM approach and evaluate its performance on the Intrepid IBM-BlueGene and Titan Cray-XK7 systems using Chombo-based AMR applications and a visualization application. The experimental results demonstrate its effectiveness in meeting user defined objectives and accelerating overall scientific discovery. © 2020 Elsevier Inc.
引用
收藏
页码:35 / 51
页数:16
相关论文
共 46 条
[21]  
Lorensen W.E., Cline H.E.
[22]  
Lu C., Alvarez G.A., Wilkes J., Aqueduct: Online data migration with performance guarantees, Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST ’02, (2002)
[23]  
Lu C., Stankovic J.A., Son S.H., Tao G., Feedback control real-time scheduling: Framework, modeling, and algorithms*, Real-Time Syst., 23, 1-2, (2002)
[24]  
Ma K.-L., In situ visualization at extreme scale: Challenges and opportunities, IEEE Comput. Graph. Appl., 29, 6, pp. 14-19, (2009)
[25]  
Ma X., Lee J., Winslett M., High-level buffering for hiding periodic output cost in scientific simulations, IEEE Trans. Parallel Distrib. Syst., 17, 3, pp. 193-204, (2006)
[26]  
Magee J., Dulay N., Eisenbach S., Kramer J., Specifying distributed software architectures, (1995)
[27]  
Malakar P., Vishwanath V., Munson T., Knight C., Hereld M., Leyffer S., Papka M.E., Optimal scheduling of in-situ analysis for large-scale scientific simulations, (2015)
[28]  
Oldfield R., Widener P., Maccabe A., Ward L., Kordenbrock T., Efficient data-movement for lightweight i/o, Cluster Computing, 2006 IEEE International Conference on, pp. 1-9, (2006)
[29]  
Oreizy P., Gorlick M.M., Taylor R.N., Heimbigner D., Johnson G., Medvidovic N., Quilici A., Rosenblum D.S., Wolf A.L., An architecture-based approach to self-adaptive software, IEEE Intell. Syst., 14, 3, (1999)
[30]  
Oreizy P., Medvidovic N., Taylor R.N., Architecture-based runtime software evolution, (1998)