A Framework for Multitasking Data-Intensive Management Services in High Performance Computing Environments

被引:0
|
作者
Kulasekaran, Sivakumar [1 ,3 ]
Esteva, Maria [1 ,3 ]
Trelogan, Jessica [2 ,3 ]
Liu, Si [1 ,3 ]
机构
[1] Texas Adv Comp Ctr, Houston, TX 77054 USA
[2] Inst Class Archaeol, London, England
[3] Univ Texas Austin, Austin, TX 78712 USA
来源
2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015) | 2015年
关键词
Multitasking data management services; high performance computing; archaeology data; data intensive computing;
D O I
10.1109/BigDataService.2015.42
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data management entails a continuum of tasks to develop sustainable and reusable collections throughout their lifecycle. Large collections with complex data formats and structures may require what we define as "multitasking data management," involving a combination of manual and automated iterative tasks. When conducted in a desktop computing environment by curators, these tasks can be labor-intensive and disruptive of research. While the process can be made much more efficient within a Data-Intensive High Performance Computing (DIC/HPC) infrastructure, it remains a challenge to implement generalizable services so that automated workflows can be easily performed by non-expert users. This paper introduces a framework for automating data management activities as data-intensive computing jobs within a multitasking workflow. Using as a case study a set of legacy data from an archaeological collection in need of reorganization, we identified the steps required to re-sort and move approximately 27,000 data files into a structured collection architecture. Because not all data management workflows are the same, and because there are a wide range of requirements for job submission within data-intensive HPC resources, we derived a set of generalizable modules that can be used as a guide for curators and HPC consultants. This framework may accommodate collections with different data types and data management requirements and can be conducted by curators trained in HPC usage but without ample computational expertise. Upon testing, we implemented the framework as a service on a DIC/HPC cluster.
引用
收藏
页码:333 / 340
页数:8
相关论文
共 50 条
  • [21] Extreme Data-Intensive Scientific Computing
    Szalay, Alexander S.
    COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 34 - 41
  • [22] THE CHANGING PARADIGM OF DATA-INTENSIVE COMPUTING
    Kouzes, Richard T.
    Anderson, Gordon A.
    Elbert, Stephen T.
    Gorton, Ian
    Gracio, Deborah K.
    COMPUTER, 2009, 42 (01) : 26 - 34
  • [23] Utility-Driven Data Management for Data-Intensive Applications in Fog Environments
    Cappiello, Cinzia
    Pernici, Barbara
    Plebani, Pierluigi
    Vitali, Monica
    ADVANCES IN CONCEPTUAL MODELING, ER 2017, 2017, 10651 : 216 - 226
  • [24] Survey of Scientific Programming Techniques for the Management of Data-Intensive Engineering Environments
    Maria Alvarez-Rodriguez, Jose
    Alor-Hernandez, Giner
    Mejia-Miranda, Jezreel
    SCIENTIFIC PROGRAMMING, 2018, 2018
  • [25] Mochi: Composing Data Services for High-Performance Computing Environments
    Ross, Robert B.
    Amvrosiadis, George
    Carns, Philip
    Cranor, Charles D.
    Dorier, Matthieu
    Harms, Kevin
    Ganger, Greg
    Gibson, Garth
    Gutierrez, Samuel K.
    Latham, Robert
    Robey, Bob
    Robinson, Dana
    Settlemyer, Bradley
    Shipman, Galen
    Snyder, Shane
    Soumagne, Jerome
    Zheng, Qing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (01) : 121 - 144
  • [26] Mochi: Composing Data Services for High-Performance Computing Environments
    Robert B. Ross
    George Amvrosiadis
    Philip Carns
    Charles D. Cranor
    Matthieu Dorier
    Kevin Harms
    Greg Ganger
    Garth Gibson
    Samuel K. Gutierrez
    Robert Latham
    Bob Robey
    Dana Robinson
    Bradley Settlemyer
    Galen Shipman
    Shane Snyder
    Jerome Soumagne
    Qing Zheng
    Journal of Computer Science and Technology, 2020, 35 : 121 - 144
  • [27] On a Meaningful Integration of Web Services in Data-Intensive Biomedical Environments: The DICODE Approach
    de la Calle, Guillermo
    Garcia-Remesal, Miguel
    Tzagarakis, Manolis
    Christodoulou, Spyros
    Tsiliki, Georgia
    Karacapilidis, Nikos
    2012 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2012,
  • [28] A theory of data-intensive software services
    Ma, Hui
    Schewe, Klaus-Dieter
    Thalheim, Bernhard
    Wang, Qing
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2009, 3 (04) : 263 - 283
  • [29] A resource management system for data-intensive applications in desktop grid environments
    Toyama, Toshiaki
    Yamada, Yoshito
    Konishi, Katsumi
    PROCEEDINGS OF THE 18TH IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND SYSTEMS, 2006, : 60 - +
  • [30] A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing
    Rao, Bingbing
    Wang, Liqang
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 81 - 88