A Framework for Multitasking Data-Intensive Management Services in High Performance Computing Environments

被引:0
作者
Kulasekaran, Sivakumar [1 ,3 ]
Esteva, Maria [1 ,3 ]
Trelogan, Jessica [2 ,3 ]
Liu, Si [1 ,3 ]
机构
[1] Texas Adv Comp Ctr, Houston, TX 77054 USA
[2] Inst Class Archaeol, London, England
[3] Univ Texas Austin, Austin, TX 78712 USA
来源
2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015) | 2015年
关键词
Multitasking data management services; high performance computing; archaeology data; data intensive computing;
D O I
10.1109/BigDataService.2015.42
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data management entails a continuum of tasks to develop sustainable and reusable collections throughout their lifecycle. Large collections with complex data formats and structures may require what we define as "multitasking data management," involving a combination of manual and automated iterative tasks. When conducted in a desktop computing environment by curators, these tasks can be labor-intensive and disruptive of research. While the process can be made much more efficient within a Data-Intensive High Performance Computing (DIC/HPC) infrastructure, it remains a challenge to implement generalizable services so that automated workflows can be easily performed by non-expert users. This paper introduces a framework for automating data management activities as data-intensive computing jobs within a multitasking workflow. Using as a case study a set of legacy data from an archaeological collection in need of reorganization, we identified the steps required to re-sort and move approximately 27,000 data files into a structured collection architecture. Because not all data management workflows are the same, and because there are a wide range of requirements for job submission within data-intensive HPC resources, we derived a set of generalizable modules that can be used as a guide for curators and HPC consultants. This framework may accommodate collections with different data types and data management requirements and can be conducted by curators trained in HPC usage but without ample computational expertise. Upon testing, we implemented the framework as a service on a DIC/HPC cluster.
引用
收藏
页码:333 / 340
页数:8
相关论文
共 50 条
  • [31] Improvement of job completion time in data-intensive cloud computing applications
    Ibrahim, Ibrahim Adel
    Bassiouni, Mostafa
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2020, 9 (01):
  • [32] All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
    Moretti, Christopher
    Bui, Hoang
    Hollingsworth, Karen
    Rich, Brandon
    Flynn, Patrick
    Thain, Douglas
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (01) : 33 - 46
  • [33] G-Hadoop: MapReduce across distributed data centers for data-intensive computing
    Wang, Lizhe
    Tao, Jie
    Ranjan, Rajiv
    Marten, Holger
    Streit, Achim
    Chen, Jingying
    Chen, Dan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (03): : 739 - 750
  • [34] DECO: Joint Computation Scheduling, Caching, and Communication in Data-Intensive Computing Networks
    Kamran, Khashayar
    Yeh, Edmund
    Ma, Qian
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (03) : 1058 - 1072
  • [35] Alleviation of Disk I/O Contention in Virtualized Settings for Data-Intensive Computing
    Malensek, Matthew
    Pallickara, Sangmi Lee
    Pallickara, Shrideep
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 1 - 10
  • [36] Lightweight distributed computing framework for orchestrating high performance computing and big data
    Ince, Muhammed Numan
    Gunay, Melih
    Ledet, Joseph
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (04) : 1571 - 1585
  • [37] Memristor-based ternary content addressable memory (mTCAM) for data-intensive computing
    Zheng, Le
    Shin, Sangho
    Kang, Sung-Mo Steve
    SEMICONDUCTOR SCIENCE AND TECHNOLOGY, 2014, 29 (10)
  • [38] Towards building a data-intensive index for big data computing - A case study of Remote Sensing data processing
    Ma, Yan
    Wang, Lizhe
    Liu, Peng
    Ranjan, Rajiv
    INFORMATION SCIENCES, 2015, 319 : 171 - 188
  • [39] Performance Management of High Performance Computing for Medical Image Processing in Amazon Web Services
    Bao, Shunxing
    Damon, Stephen M.
    Landman, Bennett A.
    Gokhale, Aniruddha
    MEDICAL IMAGING 2016: PACS AND IMAGING INFORMATICS: NEXT GENERATION AND INNOVATIONS, 2016, 9789
  • [40] Self-adaptive Power Management Framework for High Performance Computing
    Saurav, Sumit Kumar
    Raghu, H., V
    Bapu, Bindhumadhava S.
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1913 - 1918