A Framework for Multitasking Data-Intensive Management Services in High Performance Computing Environments

被引:0
作者
Kulasekaran, Sivakumar [1 ,3 ]
Esteva, Maria [1 ,3 ]
Trelogan, Jessica [2 ,3 ]
Liu, Si [1 ,3 ]
机构
[1] Texas Adv Comp Ctr, Houston, TX 77054 USA
[2] Inst Class Archaeol, London, England
[3] Univ Texas Austin, Austin, TX 78712 USA
来源
2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015) | 2015年
关键词
Multitasking data management services; high performance computing; archaeology data; data intensive computing;
D O I
10.1109/BigDataService.2015.42
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data management entails a continuum of tasks to develop sustainable and reusable collections throughout their lifecycle. Large collections with complex data formats and structures may require what we define as "multitasking data management," involving a combination of manual and automated iterative tasks. When conducted in a desktop computing environment by curators, these tasks can be labor-intensive and disruptive of research. While the process can be made much more efficient within a Data-Intensive High Performance Computing (DIC/HPC) infrastructure, it remains a challenge to implement generalizable services so that automated workflows can be easily performed by non-expert users. This paper introduces a framework for automating data management activities as data-intensive computing jobs within a multitasking workflow. Using as a case study a set of legacy data from an archaeological collection in need of reorganization, we identified the steps required to re-sort and move approximately 27,000 data files into a structured collection architecture. Because not all data management workflows are the same, and because there are a wide range of requirements for job submission within data-intensive HPC resources, we derived a set of generalizable modules that can be used as a guide for curators and HPC consultants. This framework may accommodate collections with different data types and data management requirements and can be conducted by curators trained in HPC usage but without ample computational expertise. Upon testing, we implemented the framework as a service on a DIC/HPC cluster.
引用
收藏
页码:333 / 340
页数:8
相关论文
共 50 条
  • [41] Data Analysis of Cyber-Activity within High Performance Computing Environments
    Ji, L.
    Kolhe, S.
    Clark, A. D.
    2017 IEEE 8TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (UEMCON), 2017, : 109 - 114
  • [42] Optimization and Upgrading of Big Data Processing Techniques in High Performance Computing Environments
    Li, Jianguang
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [43] pipsCloud: High performance cloud computing for remote sensing big data management and processing
    Wang, Lizhe
    Ma, Yan
    Yan, Jining
    Chang, Victor
    Zomaya, Albert Y.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 78 : 353 - 368
  • [44] On the Use of Containers in High Performance Computing Environments
    Abraham, Subil
    Paul, Arnab K.
    Khan, Redwan Ibne Seraj
    Butt, Ali R.
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2020), 2020, : 284 - 293
  • [45] A framework for comparing high performance computing technologies
    Duran, Randall E.
    Chen, Ding
    Saraswat, Rishi
    Hallmark, Aaron
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2014, 9 (1-2) : 119 - 129
  • [46] A Distributed Cloud Resource Management Framework for High-Performance Computing (HPC) Applications
    Govindarajan, Kannan
    Kumar, Vivekanandan Suresh
    Somasundaram, Thamarai Selvi
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 1 - 6
  • [47] FAIRNESS OF TASK SCHEDULING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS
    Sedighi, Art
    Deng, Yuefan
    Zhang, Peng
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2014, 15 (03): : 273 - 288
  • [48] SODA: A Semantics-Aware Optimization Framework for Data-Intensive Applications Using Hybrid Program Analysis
    Rao, Bingbing
    Liu, Zixia
    Zhang, Hong
    Lu, Siyang
    Wang, Liqiang
    2021 IEEE 14TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2021), 2021, : 433 - 444
  • [49] MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
    Jiang, Wei
    Agrawal, Gagan
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 644 - 655
  • [50] Assessment of High Performance Computing Services Potential of SMEs
    Borstnar, M. Kljajic
    Ilijas, T.
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1414 - 1418