A flexible I/O arbitration framework for netCDF-based big data processing workflows on high-end supercomputers

被引:10
作者
Liao, Jianwei [1 ,2 ]
Gerofi, Balazs [3 ]
Lien, Guo-Yuan [3 ]
Miyoshi, Takemasa [3 ]
Nishizawa, Seiya [3 ]
Tomita, Hirofumi [3 ]
Liao, Wei-Keng [4 ]
Choudhary, Alok [4 ]
Ishikawa, Yutaka [3 ]
机构
[1] Southwest Univ China, Coll Comp & Informat Sci, Tianshen Rd 2, Chongqing, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[3] RIKEN Adv Inst Computat Sci, Kobe, Hyogo, Japan
[4] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL USA
基金
中国国家自然科学基金;
关键词
asynchronous transfer; big data processing; customizability; netCDF; parallel direct data transfer; real time; ENSEMBLE DATA ASSIMILATION; SYSTEM; MODEL; COUPLER;
D O I
10.1002/cpe.4161
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
On the verge of the convergence between high-performance computing and Big Data processing, it has become increasingly prevalent to deploy large-scale data analytics workloads on high-end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the Flagship 2020 (post-K) supercomputer project of Japan, RIKEN is investigating the feasibility of a highly accurate weather forecasting system that would provide a real-time outlook for severe guerrilla rainstorms. One of the main performance bottlenecks of this application is the lack of efficient communication among workflow components, which currently takes place over the parallel file system.In this paper, we present an initial study of a direct communication framework designed for complex workflows that eliminates unnecessary file I/O among components. Specifically, we propose an I/O arbitration layer that provides direct parallel data transfer (both synchronous and asynchronous) among job components that rely on the netCDF interface for performing I/O operations. Our solution requires only minimal modifications to application code. Moreover, we propose a configuration file-based approach that allows users to specify the desired data transfer pattern among workflow components, offering a general solution for different application contexts. We present a preliminary evaluation of the proposed framework on the K Computer (running on up to 4800 compute nodes) using RIKEN's experimental weather forecasting workflow as a case study.
引用
收藏
页数:12
相关论文
共 36 条
[11]   Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter [J].
Hunt, Brian R. ;
Kostelich, Eri J. ;
Szunyogh, Istvan .
PHYSICA D-NONLINEAR PHENOMENA, 2007, 230 (1-2) :112-126
[12]  
Ishikawa Y, 2016, 3 M JAP DOE MEXT COL
[13]  
Janjic Z.I., 2007, Geophysical Research Abstracts, General Assembly, Vienna, Austria, European Geosciences Union, V9, P2007
[14]   Interpolation oriented parallel communication to optimize coupling in earth system modeling [J].
Ji, Yingsheng ;
Zhang, Yingzhuo ;
Yang, Guangwen .
FRONTIERS OF COMPUTER SCIENCE, 2014, 8 (04) :693-708
[15]   Toward a General I/O Layer for Parallel-Visualization Applications [J].
Kendall, Wesley ;
Huang, Jian ;
Peterka, Tom ;
Latham, Robert ;
Ross, Robert .
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2011, 31 (06) :6-10
[16]   The Model Coupling Toolkit: A new fortran90 toolkit for building multiphysics parallel coupled models [J].
Larson, J ;
Jacob, R ;
Ong, E .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2005, 19 (03) :277-292
[17]   Toward a General I/O Arbitration Framework for netCDF Based Big Data Processing [J].
Liao, Jianwei ;
Gerofi, Balazs ;
Lien, Guo-Yuan ;
Nishizawa, Seiya ;
Miyoshi, Takemasa ;
Tomita, Hirofumi ;
Ishikawa, Yutaka .
EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 :293-305
[18]  
Lien GY, 2015, P 2015 INT S DAT ASS
[19]  
Linares-Rodriguez A, 2015, J HYDROL ENG, V20, DOI [10.1061/(ASCE)HE.1943-5584.0001163, 10.1061/(ASCE)HE.1943-5584.000116]
[20]  
Liu L, 2014, GEOSCIENTIFIC MODEL, V7, P3889