A Framework for Multitasking Data-Intensive Management Services in High Performance Computing Environments

被引：0

作者：

Kulasekaran, Sivakumar ^{[1
,3
]}

Esteva, Maria ^{[1
,3
]}

Trelogan, Jessica ^{[2
,3
]}

Liu, Si ^{[1
,3
]}

机构：

[1] Texas Adv Comp Ctr, Houston, TX 77054 USA

[2] Inst Class Archaeol, London, England

[3] Univ Texas Austin, Austin, TX 78712 USA

来源：

2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015) | 2015年

关键词：

Multitasking data management services; high performance computing; archaeology data; data intensive computing;

D O I：

10.1109/BigDataService.2015.42

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Data management entails a continuum of tasks to develop sustainable and reusable collections throughout their lifecycle. Large collections with complex data formats and structures may require what we define as "multitasking data management," involving a combination of manual and automated iterative tasks. When conducted in a desktop computing environment by curators, these tasks can be labor-intensive and disruptive of research. While the process can be made much more efficient within a Data-Intensive High Performance Computing (DIC/HPC) infrastructure, it remains a challenge to implement generalizable services so that automated workflows can be easily performed by non-expert users. This paper introduces a framework for automating data management activities as data-intensive computing jobs within a multitasking workflow. Using as a case study a set of legacy data from an archaeological collection in need of reorganization, we identified the steps required to re-sort and move approximately 27,000 data files into a structured collection architecture. Because not all data management workflows are the same, and because there are a wide range of requirements for job submission within data-intensive HPC resources, we derived a set of generalizable modules that can be used as a guide for curators and HPC consultants. This framework may accommodate collections with different data types and data management requirements and can be conducted by curators trained in HPC usage but without ample computational expertise. Upon testing, we implemented the framework as a service on a DIC/HPC cluster.

引用

页码：333 / 340

页数：8

共 50 条

[21] Extreme Data-Intensive Scientific Computing
Szalay, Alexander S.
COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 34 - 41
[22] THE CHANGING PARADIGM OF DATA-INTENSIVE COMPUTING
Kouzes, Richard T.
Anderson, Gordon A.
Elbert, Stephen T.
Gorton, Ian
Gracio, Deborah K.
COMPUTER, 2009, 42 (01) : 26 - 34
[23] Utility-Driven Data Management for Data-Intensive Applications in Fog Environments
Cappiello, Cinzia
Pernici, Barbara
Plebani, Pierluigi
Vitali, Monica
ADVANCES IN CONCEPTUAL MODELING, ER 2017, 2017, 10651 : 216 - 226
[24] Survey of Scientific Programming Techniques for the Management of Data-Intensive Engineering Environments
Maria Alvarez-Rodriguez, Jose
Alor-Hernandez, Giner
Mejia-Miranda, Jezreel
SCIENTIFIC PROGRAMMING, 2018, 2018
[25] Mochi: Composing Data Services for High-Performance Computing Environments
Ross, Robert B.
Amvrosiadis, George
Carns, Philip
Cranor, Charles D.
Dorier, Matthieu
Harms, Kevin
Ganger, Greg
Gibson, Garth
Gutierrez, Samuel K.
Latham, Robert
Robey, Bob
Robinson, Dana
Settlemyer, Bradley
Shipman, Galen
Snyder, Shane
Soumagne, Jerome
Zheng, Qing
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (01) : 121 - 144
[26] Mochi: Composing Data Services for High-Performance Computing Environments
Robert B. Ross
George Amvrosiadis
Philip Carns
Charles D. Cranor
Matthieu Dorier
Kevin Harms
Greg Ganger
Garth Gibson
Samuel K. Gutierrez
Robert Latham
Bob Robey
Dana Robinson
Bradley Settlemyer
Galen Shipman
Shane Snyder
Jerome Soumagne
Qing Zheng
Journal of Computer Science and Technology, 2020, 35 : 121 - 144
[27] On a Meaningful Integration of Web Services in Data-Intensive Biomedical Environments: The DICODE Approach
de la Calle, Guillermo
Garcia-Remesal, Miguel
Tzagarakis, Manolis
Christodoulou, Spyros
Tsiliki, Georgia
Karacapilidis, Nikos
2012 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2012,
[28] A theory of data-intensive software services
Ma, Hui
Schewe, Klaus-Dieter
Thalheim, Bernhard
Wang, Qing
SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2009, 3 (04) : 263 - 283
[29] A resource management system for data-intensive applications in desktop grid environments
Toyama, Toshiaki
Yamada, Yoshito
Konishi, Katsumi
PROCEEDINGS OF THE 18TH IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND SYSTEMS, 2006, : 60 - +
[30] A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing
Rao, Bingbing
Wang, Liqang
2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 81 - 88

← 1 2 3 4 5 →