Application-Agnostic Auto-tuning of Open MPI Collectives using Bayesian Optimization

被引:0
作者
Jeannot, Emmanuel [1 ]
Lemarinier, Pierre [2 ]
Mercier, Guillaume [3 ]
Robert-Hayek, Sophie [4 ]
Sartori, Richard [5 ]
机构
[1] U Bordeaux, Labri, INRIA, Talence, France
[2] ATOS, Echirolles, France
[3] U Bordeaux, INRIA, Labri, Bordeaux INP, Talence, France
[4] U Lorraine, ATOS, Echirolles, France
[5] U Bordeaux, Labri, INRIA, ATOS, Echirolles, France
来源
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024 | 2024年
关键词
Message Passing Interface; High Performance Computing; Auto-tuning; Black-box optimization;
D O I
10.1109/IPDPSW63119.2024.00141
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MPI implementations encompass a broad range of parameters that have a significant impact on performance, and these parameters vary based on the specific communication pattern. State-of the art solutions [25], [15] provide a per application tuning of these parameters which require to do the tuning for each application or redo it each time the application changes. Here, we propose an application-agnostic method that leverages Bayesian Optimization, a black-box optimization technique, to discover the optimal parametrization of collective comnunication. We conducted experiments on two HPC platforms, where we tune three Open MN parameters for four distinct collective operations and 18 message sizes. The results of our tuning exhibit an average execution-time improvement up to 48.4% compared to the default parametrization, closely aligning with the tuning achieved through exhaustive sampling. Additionally, our approach drastically reduces the tuning time by 95% in contrast to the exhaustive search, achieving a total search time of merely 6 hours instead of the original 134 hours. Furthermore, we apply our methodology to the NAS benchmarks, demonstrating its efficacy and application agnosticity in real-world scenarios.
引用
收藏
页码:771 / 780
页数:10
相关论文
共 40 条
  • [31] A comparative study of black-box optimization heuristics for online tuning of high performance computing I/O accelerators
    Robert, Sophie
    Zertal, Soraya
    Vaumourin, Gregory
    Couvee, Philippe
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (16)
  • [32] A Comparison of Search Heuristics for Empirical Code Optimization
    Seymour, Keith
    You, Haihang
    Dongarra, Jack
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2008, : 421 - 429
  • [33] Taking the Human Out of the Loop: A Review of Bayesian Optimization
    Shahriari, Bobak
    Swersky, Kevin
    Wang, Ziyu
    Adams, Ryan P.
    de Freitas, Nando
    [J]. PROCEEDINGS OF THE IEEE, 2016, 104 (01) : 148 - 175
  • [34] Optimization of collective communication operations in MPICH
    Thakur, R
    Rabenseifner, R
    Gropp, W
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2005, 19 (01) : 49 - 66
  • [35] Multi-core Aware Optimization for MPI Collectives
    Tu, Bibo
    Zou, Ming
    Zhan, Hanfeng
    Zhao, Xiaofang
    Fan, Hanping
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2008, : 322 - 325
  • [36] Wenxu Zheng, 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications
  • [37] IEEE 17th International Conference on Smart City
  • [38] IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). Proceedings, P670, DOI 10.1109/HPCC/SmartCity/DSS.2019.00101
  • [39] ACCLAiM: Advancing the Practicality of MPI Collective Communication Autotuning Using Machine Learning
    Wilkins, Michael
    Guo, Yanfei
    Thakur, Rajeev
    Dinda, Peter
    Hardavellas, Nikos
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 161 - 171
  • [40] A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems
    Wilkins, Michael
    Guo, Yanfei
    Thakur, Rajeev
    Hardavellas, Nikos
    Dinda, Peter
    Si, Min
    [J]. PROCEEDINGS OF EXAMPI 2021: WORKSHOP ON EXASCALE MPI, 2021, : 36 - 45