Automating data-model workflows at a level 12 HUC scale: Watershed modeling in a distributed computing environment

被引:25
作者
Leonard, Lorne [1 ]
Duffy, Christopher J. [1 ]
机构
[1] Penn State Univ, Dept Civil & Environm Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Distributed hydrological model; Data workflows; Data-model workflows; Model workflows; Provenance; Essential terrestrial variables; HydroTerre; PIHM; Geographic information science; Data as a service; Model as a service;
D O I
10.1016/j.envsoft.2014.07.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The prototype discussed in this article retrieves Essential Terrestrial Variable (ETV) web services and uses data-model workflows to transform ETV data for hydrological models in a distributed computing environment. The ETV workflow is a service layer to 100's of terabytes of national datasets bundled for fast data access in support of watershed modeling using the United States Geological Survey (USGS) Hydrological Unit Code (HUC) level-12 scale. The ETV data has been proposed as the Essential Terrestrial Data necessary to construct watershed models anywhere in the continental USA (Leonard and Duffy, 2013). Here, we present the hardware and software system designs to support the ETV, data-model, and model workflows using High Performance Computing (HPC) and service-oriented architecture. This infrastructure design is an important contribution to both how and where the workflows operate. We describe details of how these workflow services operate in a distributed manner for modeling CONUS HUC-12 catchments using the Penn State Integrated Hydrological Model (PIHM) as an example. The prototype is evaluated by generating data-model workflows for every CONUS HUC-12 and creating a repository of workflow provenance for every HUC-12 (similar to 100 km(2)) for use by researchers as a strategy to begin a new hydrological model study. The concept of provenance for data-model workflows developed here assures reproducibility of model simulations (e.g. reanalysis) from ETV datasets without storing model results which we have shown will require many petabytes of storage. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:174 / 190
页数:17
相关论文
共 41 条
[1]  
Adobe Developers Association, 1992, TIFF REV 6 0, P121
[2]  
[Anonymous], 2002, ACM Transactions on Internet Technology, DOI [10.1145/514183.514185, DOI 10.1145/514183.514185]
[3]  
[Anonymous], 2013, USGS HUC
[4]   The Definition of the Standard WMO Climate Normal The Key to Deriving Alternative Climate Normals [J].
Arguez, Anthony ;
Vose, Russell S. .
BULLETIN OF THE AMERICAN METEOROLOGICAL SOCIETY, 2011, 92 (06) :699-U345
[5]  
Bell Michael., 2008, Service-Oriented Modeling: Service Analysis, Design, and Architecture
[6]  
Bell Michael., 2010, SOA Modeling Patterns for Service Oriented Discovery and Analysis, V1st
[7]  
Bhatt G., 2008, iEMSs 2008: International Congress on Environmental Modelling and Software Integrating Sciences and Information Technology for Environmental Assessment and Decision Making, P8
[8]  
Cheng C., 2012, DELAUNAY MESH GENERA, P410
[9]  
CyberSTAR, 2014, SCAL TER ADV RES DIS
[10]  
Dal Santo M.Alves., 2010, Facing the Challenges - Building the Capacity, P11