Application-Agnostic Auto-tuning of Open MPI Collectives using Bayesian Optimization

被引:0
作者
Jeannot, Emmanuel [1 ]
Lemarinier, Pierre [2 ]
Mercier, Guillaume [3 ]
Robert-Hayek, Sophie [4 ]
Sartori, Richard [5 ]
机构
[1] U Bordeaux, Labri, INRIA, Talence, France
[2] ATOS, Echirolles, France
[3] U Bordeaux, INRIA, Labri, Bordeaux INP, Talence, France
[4] U Lorraine, ATOS, Echirolles, France
[5] U Bordeaux, Labri, INRIA, ATOS, Echirolles, France
来源
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024 | 2024年
关键词
Message Passing Interface; High Performance Computing; Auto-tuning; Black-box optimization;
D O I
10.1109/IPDPSW63119.2024.00141
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MPI implementations encompass a broad range of parameters that have a significant impact on performance, and these parameters vary based on the specific communication pattern. State-of the art solutions [25], [15] provide a per application tuning of these parameters which require to do the tuning for each application or redo it each time the application changes. Here, we propose an application-agnostic method that leverages Bayesian Optimization, a black-box optimization technique, to discover the optimal parametrization of collective comnunication. We conducted experiments on two HPC platforms, where we tune three Open MN parameters for four distinct collective operations and 18 message sizes. The results of our tuning exhibit an average execution-time improvement up to 48.4% compared to the default parametrization, closely aligning with the tuning achieved through exhaustive sampling. Additionally, our approach drastically reduces the tuning time by 95% in contrast to the exhaustive search, achieving a total search time of merely 6 hours instead of the original 134 hours. Furthermore, we apply our methodology to the NAS benchmarks, demonstrating its efficacy and application agnosticity in real-world scenarios.
引用
收藏
页码:771 / 780
页数:10
相关论文
共 40 条
  • [11] Chunduri S, 2018, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18)
  • [12] BOAT: Building Auto-Tuners with Structured Bayesian Optimization
    Dalibard, Valentin
    Schaarschmidt, Michael
    Yoneki, Eiko
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 479 - 488
  • [13] Black-box Optimization of Hadoop Parameters Using Derivative-free Optimization
    Desani, Diego
    Gil-Costa, Veronica
    Marcondes, Cesar A. C.
    Senger, Hermes
    [J]. 2016 24TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP), 2016, : 43 - 50
  • [14] Tuned: An open MPI collective communications component
    Fagg, Graham E.
    Bosilca, George
    Pjesivac-Grbovic, Jelena
    Angskun, Thara
    Dongarra, Jack J.
    [J]. DISTRIBUTED AND PARALLEL SYSTEMS: FROM CLUSTER TO GRID COMPUTING, 2007, : 65 - 72
  • [15] Faraj A., 2002, Proceedings of the 14th IASTED International Conference Parallel and Distributed Computing and Systems, P729
  • [16] Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences
    Huang, Shuai
    [J]. JOURNAL OF QUALITY TECHNOLOGY, 2021, 53 (04) : 440 - 441
  • [17] OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning
    Hunold, Sascha
    Steiner, Sebastian
    [J]. 2022 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS), 2022, : 123 - 128
  • [18] Predicting MPI Collective Communication Performance Using Machine Learning
    Hunold, Sascha
    Bhatele, Abhinav
    Bosilca, George
    Knees, Peter
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 259 - 269
  • [19] Hutter F, 2019, SPRING SER CHALLENGE, P1, DOI 10.1007/978-3-030-05318-5
  • [20] Hutter Frank, 2011, Learning and Intelligent Optimization. 5th International Conference, LION 5. Selected Papers, P507, DOI 10.1007/978-3-642-25566-3_40