A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks

被引:3
|
作者
Ihde, Nina [1 ]
Marten, Paula [1 ]
Eleliemy, Ahmed [2 ]
Poerwawinata, Gabrielle [2 ]
Silva, Pedro [1 ]
Tolovski, Ilin [1 ]
Ciorba, Florina M. [2 ]
Rabl, Tilmann [1 ]
机构
[1] Hasso Platner Inst, Potsdam, Germany
[2] Univ Basel, Basel, Switzerland
来源
PERFORMANCE EVALUATION AND BENCHMARKING, TPCTC 2021 | 2022年 / 13169卷
关键词
Benchmarking; Big Data; HPC; Machine Learning; PARALLEL;
D O I
10.1007/978-3-030-94437-7_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been a convergence of Big Data (BD), High Performance Computing (HPC), and Machine Learning (ML) systems. This convergence is due to the increasing complexity of long data analysis pipelines on separated software stacks. With the increasing complexity of data analytics pipelines comes a need to evaluate their systems, in order to make informed decisions about technology selection, sizing and scoping of hardware. While there are many benchmarks for each of these domains, there is no convergence of these efforts. As a first step, it is also necessary to understand how the individual benchmark domains relate. In this work, we analyze some of the most expressive and recent benchmarks of BD, HPC, and ML systems. We propose a taxonomy of those systems based on individual dimensions such as accuracy metrics and common dimensions such as workload type. Moreover, we aim at enabling the usage of our taxonomy in identifying adapted benchmarks for their BD, HPC, and ML systems. Finally, we identify challenges and research directions related to the future of converged BD, HPC, and ML system benchmarking.
引用
收藏
页码:98 / 118
页数:21
相关论文
共 50 条
  • [1] Machine learning on big data for future computing
    Jeong, Young-Sik
    Hassan, Houcine
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (06): : 2925 - 2929
  • [2] Machine learning on big data for future computing
    Young-Sik Jeong
    Houcine Hassan
    Arun Kumar Sangaiah
    The Journal of Supercomputing, 2019, 75 : 2925 - 2929
  • [3] Green Computing for Big Data and Machine Learning
    Barua, Hrishav Bakul
    Mondal, Kartick Chandra
    Khatua, Sunirmal
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 348 - 351
  • [4] A survey of machine learning for big data processing
    Junfei Qiu
    Qihui Wu
    Guoru Ding
    Yuhua Xu
    Shuo Feng
    EURASIP Journal on Advances in Signal Processing, 2016
  • [5] A Survey of Machine Learning Methods for Big Data
    Ruiz, Zoila
    Salvador, Jaime
    Garcia-Rodriguez, Jose
    BIOMEDICAL APPLICATIONS BASED ON NATURAL AND ARTIFICIAL COMPUTING, PT II, 2017, 10338 : 259 - 267
  • [6] A survey of machine learning for big data processing
    Qiu, Junfei
    Wu, Qihui
    Ding, Guoru
    Xu, Yuhua
    Feng, Shuo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
  • [7] ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data
    Petrini, Alessandro
    Notaro, Marco
    Gliozzo, Jessica
    Castrignano, Tiziana
    Robinson, Peter N.
    Casiraghi, Elena
    Valentini, Giorgio
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2022 IFIP WG 12.5 INTERNATIONAL WORKSHOPS, 2022, 652 : 424 - 435
  • [8] Survey of Machine Learning Methods for Big Data Applications
    Vinothini, A.
    Priya, S. Baghavathi
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [9] Systematic Survey on Evolution of Machine Learning for Big Data
    Swathi, R.
    Seshadri, R.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 204 - 209
  • [10] Erratum to: A survey of machine learning for big data processing
    Junfei Qiu
    Qihui Wu
    Guoru Ding
    Yuhua Xu
    Shuo Feng
    EURASIP Journal on Advances in Signal Processing, 2016