A simulation provenance data management system for efficient job execution on an online computational science engineering platform

被引:1
作者
Ma, Jin [1 ]
Lee, Sik [1 ]
Cho, Kum Won [1 ]
Suh, Young-Kyoon [2 ]
机构
[1] Korea Inst Sci & Technol Informat, Natl Inst Supercomp & Networking, Daejeon, South Korea
[2] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2019年 / 22卷 / 01期
基金
新加坡国家研究基金会;
关键词
EDISON; HPC; PROV; Provenance; Science app; Simulation;
D O I
10.1007/s10586-018-2827-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the past few years an online simulation service platform (named EDISON) has been applauded by several computational science and engineering communities in several countries. Though armed with multiple computing clusters and high-end storage resources, the platform has suffered from handling a huge amount of CPU-/IO-bound simulations that are most duplicated. Such intense simulations are normally admitted with no duplicate elimination and thus can adversely affect the performance of the platform. To address this performance concern, we propose a novel system, termed SuperMan, to seamlessly record and retrieve the provenances of previously executed simulations, and so prevent users from initiating duplicate and/or similar simulations using the limited computing resources. The system collects the simulation provenances based on a variant of a de-facto standard form, thereby offering interoperability. Based on the stored provenances, the system can provide useful simulation run statistics for users that need assistance. SuperMan also applies a hash-based duplicate elimination technique, resulting in making more efficient simulations on the platform. Finally, we show that the proposed proposed system could remove slightly over half of duplicate simulations on a variety of simulation software while obtaining about overall elapsed time savings of 30% and queuing time savings of 25%.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 17 条
  • [1] DataSpaces: an interaction and coordination framework for coupled simulation workflows
    Docan, Ciprian
    Parashar, Manish
    Klasky, Scott
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2012, 15 (02): : 163 - 181
  • [2] ECMA, ECMA-404
  • [3] The NEEShub Cyberinfrastructure for Earthquake Engineering
    Hacker, Thomas J.
    Eigenmann, Rudi
    Bagchi, Saurabh
    Irfanoglu, Ayhan
    Pujol, Santiago
    Catlin, Ann
    Rathje, Ellen
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (04) : 67 - 77
  • [4] Scibox: Online Sharing of Scientific Data via the Cloud
    Huang, Jian
    Zhang, Xuechen
    Eisenhauer, Greg
    Schwan, Karsten
    Wolf, Matthew
    Ethier, Stephane
    Klasky, Scott
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [5] nanoHUB.org: Advancing education and research in nanotechnology
    Klimeck, Gerhard
    McLennan, Michael
    Brophy, Sean B.
    Adams, George B., III
    Lundstrom, Mark S.
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2008, 10 (05) : 17 - 23
  • [6] Development of a simulation result management and prediction system using machine learning techniques
    Lee, Ki Yong
    Suh, Young-Kyoon
    Cho, Kum Won
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 19 (01) : 75 - 96
  • [7] Liferay, LIF PORT 6 2
  • [8] Design and Implementation of Information Management Tools for the EDISON Open Platform
    Ma, Jin
    Lee, Jongsuk Ruth
    Cho, Kumwon
    Park, Minjae
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2017, 11 (02): : 1089 - 1104
  • [9] HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering
    McLennan, Michael
    Kennell, Rick
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2010, 12 (02) : 48 - 52
  • [10] Mishin D, 2014, ASTR SOC P, V485, P465