Automating data-model workflows at a level 12 HUC scale: Watershed modeling in a distributed computing environment

被引:25
作者
Leonard, Lorne [1 ]
Duffy, Christopher J. [1 ]
机构
[1] Penn State Univ, Dept Civil & Environm Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Distributed hydrological model; Data workflows; Data-model workflows; Model workflows; Provenance; Essential terrestrial variables; HydroTerre; PIHM; Geographic information science; Data as a service; Model as a service;
D O I
10.1016/j.envsoft.2014.07.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The prototype discussed in this article retrieves Essential Terrestrial Variable (ETV) web services and uses data-model workflows to transform ETV data for hydrological models in a distributed computing environment. The ETV workflow is a service layer to 100's of terabytes of national datasets bundled for fast data access in support of watershed modeling using the United States Geological Survey (USGS) Hydrological Unit Code (HUC) level-12 scale. The ETV data has been proposed as the Essential Terrestrial Data necessary to construct watershed models anywhere in the continental USA (Leonard and Duffy, 2013). Here, we present the hardware and software system designs to support the ETV, data-model, and model workflows using High Performance Computing (HPC) and service-oriented architecture. This infrastructure design is an important contribution to both how and where the workflows operate. We describe details of how these workflow services operate in a distributed manner for modeling CONUS HUC-12 catchments using the Penn State Integrated Hydrological Model (PIHM) as an example. The prototype is evaluated by generating data-model workflows for every CONUS HUC-12 and creating a repository of workflow provenance for every HUC-12 (similar to 100 km(2)) for use by researchers as a strategy to begin a new hydrological model study. The concept of provenance for data-model workflows developed here assures reproducibility of model simulations (e.g. reanalysis) from ETV datasets without storing model results which we have shown will require many petabytes of storage. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:174 / 190
页数:17
相关论文
共 41 条
[31]  
Seaber P. R., 1987, OPEN FILE REPORTS SE, P66
[32]  
Shewchuk J. R., 1997, DELAUNAY REFINEMENT
[33]  
SWAT, 2013, SOIL WAT ASS TOOL
[34]  
Tarboton DG, 2009, 18TH WORLD IMACS CONGRESS AND MODSIM09 INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION, P988
[35]  
Tarboton D. G., 2011, TAUDEM HYDROLOGY RES
[36]   Toward self-describing and workflow integrated Earth system models: A coupled atmosphere-ocean modeling system application [J].
Turuncoglu, Ufuk Utku ;
Dalfes, Nuzhet ;
Murphy, Sylvia ;
DeLuca, Cecelia .
ENVIRONMENTAL MODELLING & SOFTWARE, 2013, 39 :247-262
[37]  
World Wide Web Consortium, 2014, EXT MARK LANG
[38]  
World Wide Web Consortium, 2014, SIMPL OBJ ACC PROT
[39]  
World Wide Web Consortium, 2014, WEB SERV DESCR LANG
[40]  
XSEDE, 2014, SAN DIEG SUP CTR TRE