Contributions to High-Performance Big Data Computing

被引:2
|
作者
Fox, Geoffrey [1 ]
Qiu, Judy [1 ]
Crandall, David [1 ]
Von Laszewski, Gregor [1 ]
Beckstein, Oliver [2 ]
Paden, John [3 ]
Paraskevakos, Ioannis [4 ]
Jha, Shantenu [4 ]
Wang, Fusheng [5 ]
Marathe, Madhav [6 ,7 ]
Vullikanti, Anil [6 ,7 ]
Cheatham, Thomas [8 ]
机构
[1] Indiana Univ, Bloomington, IN USA
[2] Arizona State Univ, Tempe, AZ 85287 USA
[3] Kansas Univ, Lawrence, KS USA
[4] Rutgers State Univ, New Brunswick, NJ USA
[5] SUNY Stony Brook, Stony Brook, NY 11794 USA
[6] Virginia Tech, Blacksburg, VA USA
[7] Univ Virginia, Charlottesville, VA 22903 USA
[8] Univ Utah, Salt Lake City, UT 84112 USA
关键词
HPC; Big Data; Clouds; Graph Analytics; Polar Science; Pathology; Biomolecular simulations; Network Science; MIDAS; SPIDAL; IMAGE REGISTRATION; DATA ANALYTICS; SOFTWARE; SYSTEM; SPARK; RECONSTRUCTION; LOCALIZATION; ALGORITHMS; LIBRARY; HADOOP;
D O I
10.3233/APC190005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Our project is at the interface of Big Data and HPC - High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture, including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.
引用
收藏
页码:34 / 81
页数:48
相关论文
共 50 条
  • [31] An Overview on the Convergence of High Performance Computing and Big Data Processing
    Mei, Songzhu
    Guan, Hongtao
    Wang, Qinglin
    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 1046 - 1051
  • [32] Data management for high-performance computing users.
    Kleese, K
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1999, 218 : U372 - U373
  • [33] High-performance computing service for bioinformatics and data science
    Courneya, Jean-Paul
    Mayo, Alexa
    JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2018, 106 (04) : 494 - 495
  • [34] Predictive Analytics on Genomic Data with High-Performance Computing
    Leung, Carson K.
    Sarumi, Oluwafemi A.
    Zhang, Christine Y.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2187 - 2194
  • [35] Fundamentals of modeling, data assimilation, and high-performance computing
    Rood, Richard B.
    OBSERVING SYSTEMS FOR ATMOSPHERIC COMPOSITION: SATELLITE, AIRCRAFT, SENSOR WEB AND GROUND-BASED OBSERVATIONAL METHODS AND STRATEGIES, 2007, : 207 - 229
  • [36] Data monitoring in high-performance clusters for computing applications
    Torralba, G
    González, V
    Sanchis, E
    Tao, J
    Schulz, M
    Karl, W
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2002, 49 (02) : 525 - 531
  • [37] High-performance computing system for electroencephalographic data analysis
    Lazar, Zsolt I.
    Heringa, Jouke R.
    Papp, Istvan
    Lazar, Alpar S.
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2010, 77 (03) : 311 - 312
  • [38] Influence of ATM on data serving for high-performance computing
    Stuart, David C.
    Computer Technology Review, (Suppl):
  • [39] High-Performance Computing
    Bungartz, Hans-Joachim
    IT-INFORMATION TECHNOLOGY, 2013, 55 (03): : 83 - 85