MaLeFICE: Machine learning support for continuous performance improvement in computational engineering

被引:0
作者
Sonmezer, Hasan Berk [1 ]
Muhtaroglu, Nitel [1 ]
Ari, Ismail [1 ]
Gokcin, Deniz [1 ]
机构
[1] Ozyegin Univ, Dept Comp Sci, Istanbul, Turkey
关键词
batch scheduling; classification; cloud; clustering; finite element analysis; DevOps; docker; machine learning; virtualization; DESIGN;
D O I
10.1002/cpe.6674
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Computer aided engineering (CAE) practices improved drastically within the last decade due to ease of access to computing resources and open-source software. However, increasing complexity of hardware and software settings and the scarcity of multiskilled personnel rendered the practice inefficient and infeasible again. In this article, we present a method for continuous performance improvement in computational engineering that combines online performance profiling with machine learning (ML). To test the viability of this method, we provide a detailed analysis for solution time estimation of finite element analysis (FEA) jobs based on multidimensional models. These models combine numerous matrix features (matrix size, density, bandwidth, etc.), solver features (direct-iterative, preconditioning, tolerance), and hardware features (core count, virtual-physical). We repeat our analysis over different machines as well as docker containers to demonstrate applicability over different platforms. Next, we train supervised and unsupervised ML algorithms over commonly used, realistic FEA benchmarks and compare accuracy of different models. Finally, we design two new ML-based online batch schedulers called shortest predicted time first (SPTF) and shortest cluster time first (SCTF), which are comparable in performance to the optimal, but offline shortest job first (SJF) scheduler. We find that ML-based profiling and scheduling can reduce the average turnaround times by 2x-5x over other alternatives.
引用
收藏
页数:16
相关论文
共 36 条
  • [11] Bui G., 2015, P 4 INT C PAR DISTR
  • [12] Smart finite elements: A novel machine learning application
    Capuano, German
    Rimoli, Julian J.
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2019, 345 : 363 - 381
  • [13] Parallel distributed computing using Python']Python
    Dalcin, Lisandro D.
    Paz, Rodrigo R.
    Kler, Pablo A.
    Cosimo, Alejandro
    [J]. ADVANCES IN WATER RESOURCES, 2011, 34 (09) : 1124 - 1139
  • [14] Frank E, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P1269, DOI 10.1007/978-0-387-09823-4_66
  • [15] Gerhardt Lisa, 2017, Journal of Physics: Conference Series, V898, DOI 10.1088/1742-6596/898/8/082021
  • [16] Skyport - Container-Based Execution Environment Management for Multi-Cloud Scientific Workflows
    Gerlach, Wolfgang
    Tang, Wei
    Keegan, Kevin
    Harrison, Travis
    Wilke, Andreas
    Bischof, Jared
    D'Souza, Mark
    Devoid, Scott
    Murphy-Olson, Daniel
    Desai, Narayan
    Meyer, Folker
    [J]. 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, : 25 - 32
  • [17] Gropp W. D., 1999, P PARALLEL, P233
  • [18] A brief introduction to Krylov space methods for solving linear systems
    Gutknecht, Martin H.
    [J]. Frontiers of Computational Science, 2007, : 53 - 62
  • [19] Jasak H., 2017, P INT WORKSH COUPL M, P120
  • [20] Lisa Su AMD CEO, 2021, CONSUMER ELECT SHOW